Daily Archives

Articles indexed Tuesday May 18 2010

Page 11/121 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

What is a manhattan database?

- by Adnan Anwar

A friend of mine was interviewing for a data warehouse and Business Object role But he was asked about the Manhattan database? I have Googled "Manhattan database" and even searched for it on Bing and Yahoo but have found no relevant information. Any help would be greatly appreciated!

Read the article
Video Tutorials for Ab Initio ETL Data Ware housing Tool !

- by srihari

Hi All, Please tell me where I can find video tutorials of Ab-Initio ETL Data Ware Housing tool. I surfed in google but i did not find any materials. Thanks in Advance.

Read the article
Continuous Integration with Oracle Products

- by Lee Gathercole

Hi, I'm currently working on a Datawarehouse project using an Oracle Database, Oracle Data Integrator, Oracle Warehouse Builder and some Jython thrown in for good measure. All of which is held within TFS. My background is .net and prior to this project was seeing a lot of promise in CI. I'm not suggesting that the testing element of CI is feasible in this instance, but I would like to implement a stable deployment strategy. What I'm trying to understand is whether or not I can build some NANT scripts that will allow me to deploy ODI\OWB\Oracle DB code to any given environment at any point. Has anyone tried this before? Are there more appropriate tools out there that lends themselves better to this sort of toolset? Am I just a crazy horse to be evening contemplating this? Any view would be greatly appreciated. Thanks Lee

Read the article
What to name column in database table that holds versioning number

- by rwmnau

I'm trying to figure out what to call the column in my database table that holds an INT to specific "record version". I'm currently using "RecordOrder", but I don't like that, because people think higher=newer, but the way I'm using it, lower=newer (with "1" being the current record, "2" being the second most current, "3" older still, and so on). I've considered "RecordVersion", but I'm afraid that would have the same problem. Any other suggestions? "RecordAge"? I'm doing this because when I insert into the table, instead of having to find out what version is next, then run the risk of having that number stolen from me before I write, I just insert insert with a "RecordOrder" of 0. There's a trigger on the table AFTER INSERT that increments all the "RecordOrder" numbers for that key by 1, so the record I just inserted becomes "1", and all others are increased by 1. That way, you can get a person's current record by selection RecordOrder=1, instead of getting the MAX(RecordOrder) and then selecting that. PS - I'm also open to criticism about why this is a terrible idea and I should be incrementing this index instead. This just seemed to make lookups much easier, but if it's a bad idea, please enlighten me! Some details about the data, as an example: I have the following database table: CREATE TABLE AmountDue ( CustomerNumber INT, AmountDue DECIMAL(14,2), RecordOrder SMALLINT, RecordCreated DATETIME ) A subset of my data looks like this: CustomerNumber Amountdue RecordOrder RecordCreated 100 0 1 2009-12-19 05:10:10.123 100 10.05 2 2009-12-15 06:12:10.123 100 100.00 3 2009-12-14 14:19:10.123 101 5.00 1 2009-11-14 05:16:10.123 In this example, there are three rows for customer 100 - they owed $100, then $10.05, and now they owe nothing. Let me know if I need to clarify it some more. UPDATE: The "RecordOrder" and "RecordCreated" columns are not available to the user - they're only there for internal use, and to help figure out which is the current customer record. Also, I could use it to return an appropriately-ordered customer history, though I could just as easily do that with the date. I can accomplish the same thing as an incrementing "Record Version" with just the RecordCreated date, I suppose, but that removes the convenience of knowing that RecordOrder=1 is the current record, and I'm back to doing a sub-query with MAX or MIN on the DateTime to determine the most recent record.

Read the article
Is there a bare bones sample WPF sample App?

- by Renee

I am looking for a sample line of business app that has source code available. I am just starting out with WPF so something with a GUI that add, update and delete from a db will be perfect.

Read the article
How to index a table with a Type 2 slowly changing dimension for optimal performance

- by The Lazy DBA

Suppose you have a table with a Type 2 slowly-changing dimension. Let's express this table as follows, with the following columns: * [Key] * [Value1] * ... * [ValueN] * [StartDate] * [ExpiryDate] In this example, let's suppose that [StartDate] is effectively the date in which the values for a given [Key] become known to the system. So our primary key would be composed of both [StartDate] and [Key]. When a new set of values arrives for a given [Key], we assign [ExpiryDate] to some pre-defined high surrogate value such as '12/31/9999'. We then set the existing "most recent" records for that [Key] to have an [ExpiryDate] that is equal to the [StartDate] of the new value. A simple update based on a join. So if we always wanted to get the most recent records for a given [Key], we know we could create a clustered index that is: * [ExpiryDate] ASC * [Key] ASC Although the keyspace may be very wide (say, a million keys), we can minimize the number of pages between reads by initially ordering them by [ExpiryDate]. And since we know the most recent record for a given key will always have an [ExpiryDate] of '12/31/9999', we can use that to our advantage. However... what if we want to get a point-in-time snapshot of all [Key]s at a given time? Theoretically, the entirety of the keyspace isn't all being updated at the same time. Therefore for a given point-in-time, the window between [StartDate] and [ExpiryDate] is variable, so ordering by either [StartDate] or [ExpiryDate] would never yield a result in which all the records you're looking for are contiguous. Granted, you can immediately throw out all records in which the [StartDate] is greater than your defined point-in-time. In essence, in a typical RDBMS, what indexing strategy affords the best way to minimize the number of reads to retrieve the values for all keys for a given point-in-time? I realize I can at least maximize IO by partitioning the table by [Key], however this certainly isn't ideal. Alternatively, is there a different type of slowly-changing-dimension that solves this problem in a more performant manner?

Read the article
Many-To-Many dimensional model

- by Mevdiven

Folks, I have a dimension table called DIM_FILE which holds information of the files we received from customers. Each file has detail records which constitutes my FACT table, CUST_DETAIL. In the main process, file is gone through several stages and each stage tags a status to it. Long in a short, I have many-to-many relationship. Any ideas around star schema dimensional modeling. A customer record only belong to a single file and a file can have multiple statuses. FACT ---- CustID FileID AmountDue DIM_FILE -------- FileID FileName DateReceived FILE_STATUS ----------- FileID StatusDateTime StatusCode

Read the article
error handeling in informatca power center

- by user223541

i want to devlop a mapping for followinfg scenerio . I have a 1 source and 1 target and 1 error table.Target and Error tables have all fields that are present in source tables.But the data type o of all fieds for error table are varchar .Error table dont have integirty or foreign key and other constraints . Error table also have2 more fileds .Error no and error msg. Now when the workflow is executed if there is erro while inserting any record then that recored shold be moved to error table.Also the data base error code and error message should be logged in error no and error message in error tables fields as mentioned. How can i devlop such a mappng?Where can i find exaples of such mapping ?

Read the article
informatica mapping examples

- by user223541

i want to devlope generic mapping for handling data bases errors in Informatica. Can any one give me any examples of such mappings ? Also can u suggent me some resources for informatica samples mappings (livw webiste or links)

Read the article
Problem performance datawarehouse with lots of indexes

- by Lieven Cardoen

Our product takes tests of some 350 candidates at the same time. At the end of the test, results for each candidate are moved to a datawarehouse full of indexes on it. For each test there's some 400 records to be entered in datawarehouse. So 400 x 350 is a lot of records. If there are not much records in the datawarehouse, all goes well. But if there are already lots of records in the datawarehouse, then a lot of inserts fail... Is there a way to have indexes that are only rebuild at the end of the day or isn't that the real problem? Or how would you solve this?

Read the article
database design suggestion needed

- by JMSA

I need to design a table for daily sales of pharmaceutical products. There are hundreds of types of products available {Name, code}. Thousands of sales-persons are employed to sell those products{name, code}. They collect products from different depots{name, code}. They work in different Areas - Zones - Markets - Outlets, etc. {All have names and codes} Each product has various types of prices {Production Price, Trade Price, Business Price, Discount Price, etc.}. And, sales-persons are free to choose from those combination to estimate the sales price. The problem is, daily sales requires huge amount of data-entry. Within couple of years there may be gigabytes of data (if not terabytes). If I need to show daily, weekly, monthly, quarterly and yearly sales reports there will be various types of sql queries I shall need. This is my initial design: Product {ID, Code, Name, IsActive} ProductXYZPriceHistory {ID, ProductID, Date, EffectDate, Price, IsCurrent} SalesPerson {ID, Code, Name, JoinDate, and so on..., IsActive} SalesPersonSalesAraeaHistory {ID, SalesPersonID, SalesAreaID, IsCurrent} Depot {ID, Code, Name, IsActive} Outlet {ID, Code, Name, AreaID, IsActive} AreaHierarchy {ID, Code, Name, PrentID, AreaLevel, IsActive} DailySales {ID, ProductID, SalesPersonID, OutletID, Date, PriceID, SalesPrice, Discount, etc...} Now, apart from indexing, how can I normalize my DailySales table to have a fine grained design that I shall not need to change for years to come? Please show me a sample design of only the DailySales data-entry table (from which all types of reports would be queried) on the basis of above information. I don't need a detailed design advice. I just need an advice regarding only the DailySales table. Is there any way to break this particular table to achieve granularity?

Read the article
Entity Framework 4.0 Book Recommendations?

- by Greg

Hi, Anyone recommend any good Entity Framework 4.0 books? Ideally in eBook/PDF form. That is the covers the latest version of EF and VS2010. thansk

Read the article
How to improve performance of non-scalar aggregations on denormalized tables

- by The Lazy DBA

Suppose we have a denormalized table with about 80 columns, and grows at the rate of ~10 million rows (about 5GB) per month. We currently have 3 1/2 years of data (~400M rows, ~200GB). We create a clustered index to best suit retrieving data from the table on the following columns that serve as our primary key... [FileDate] ASC, [Region] ASC, [KeyValue1] ASC, [KeyValue2] ASC ... because when we query the table, we always have the entire primary key. So these queries always result in clustered index seeks and are therefore very fast, and fragmentation is kept to a minimum. However, we do have a situation where we want to get the most recent FileDate for every Region, typically for reports, i.e. SELECT [Region] , MAX([FileDate]) AS [FileDate] FROM HugeTable GROUP BY [Region] The "best" solution I can come up to this is to create a non-clustered index on Region. Although it means an additional insert on the table during loads, the hit isn't minimal (we load 4 times per day, so fewer than 100,000 additional index inserts per load). Since the table is also partitioned by FileDate, results to our query come back quickly enough (200ms or so), and that result set is cached until the next load. However I'm guessing that someone with more data warehousing experience might have a solution that's more optimal, as this, for some reason, doesn't "feel right".

Read the article
MDX performance vs. T-SQL

- by SubPortal

I have a database containing tables with more than 600 million records and a set of stored procedures that make complex search operations on the database. The performance of the stored procedures is so slow even with suitable indexes on the tables. The design of the database is a normal relational db design. I want to change the database design to be multidimensional and use the MDX queries instead of the traditional T-SQL queries but the question is: Is the MDX query better than the traditional T-SQL query with regard to performance? and if yes, to what extent will that improve the performance of the queries? Thanks for any help.

Read the article
Oracle data warehouse design - fact table acting as a dimension?

- by Elizabeth

THANKS: Both answers here are very helpful, but I could only pick one. I really appreciate the advice! our datawarehouse will be used more for workflow reports than traditional analytical reports. Our users care about "current picture" far more than history. (though history matters, too.) We are a government entity that does not have costs or related calculations. Mostly just counts of people within given locations and with related history. We are using Oracle, and I have found distinct advantage in using the star join whenever possible and would like to rearchitect everything to as closely resemble the star schema as is reasonable for our business uses. Speed in this DW is vital, and a number of tests have already proven the star schema approach to me. Our "person" table is key - it contains over 4 million records and will be the most frequently used source in queries. It can be seen at the center of a star with multiple dimensions (like age, gender, affiliation, location, etc.). It is a very LONG table, particularly when I join it to the address and contact information. However, it is more like a dimension table when we start looking at history. For example, there are two different history tables that have a person key pointing to the person table. One has over 20 million records and the other has almost 50 million and grows daily. Is this table a fact table or a dimension table? Can one work as both? If so, is that going to be a big performance problem? Is it common to query more off of a dimension than a fact? What happens if a DIFFERENT fact table that uses the person table as a dimension is actually only 60,000 records (much smaller.). I think my problem is that our data and use of it does not fit with the commonly use examples of star schemas. CLARIFICATION: Some good thoughts have been added below, but perhaps I left too much out to really explain well. Here's some more info: We handle a voter database. We don't have any measures except voter counts by various groups: voter counts by party, by age, by location; voter counts by ballot type and election, by ballot status and election, etc. We do have a "voting history" log as well as an activity audit log (change of address, party, etc.). We have information on which voters are election workers and all that related information. I figure I'll get to the peripheral stuff later. For now I'm focusing on our two major "business processes": voter registration(which IS a voter.) and election turnout. In the first, voter is a fact. In the second, voter is a dimension, along with party, election, and type of ballot. (and in case anyone is worried - no we don't know HOW people vote. Just that they do. LOL ) I hope that clarifies things a bit.

Read the article
Non-relational database modeling tool?

- by Angel Escobedo

Hey guys, please recommend some tools you have used succesfully on DW, DataMart, BI an non-relational modeling. Example for automatic creation of snow-flake Schemas, dimensions and facts tables. Wich tools makes you sense familiarity with the diagrams and surrogates keys and it will have the option for export or connect to SQL Server 2008. Thanks

Read the article
best way to statistically detect anomalies in data

- by reinier

Hi, our webapp collects huge amount of data about user actions, network business, database load, etc etc etc All data is stored in warehouses and we have quite a lot of interesting views on this data. if something odd happens chances are, it shows up somewhere in the data. However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities. My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'. Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go? Any pointers would be great! EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly. It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off. R

Read the article
ASP.Net MVC: Showing the same data using different layouts...

- by vdh_ant

Hi guys I'm wanting to create a page that allows the users to select how they would like to view their data - i.e. summary (which supports grouping), grid (which supports grouping), table (which supports grouping), map, time line, xml, json etc. Now each layout would probably have different use a different view model, which inherit from a common base class/view model. The reason being that each layout needs the object structure that it deals with to be different (some need hierarchical others a flatter structure). Each layout would call the same repository method and each layout would support the same functionality, i.e. searching and filtering (hence these controls would be shared between layouts). The main exception to this would be sorting which only grid and table views would need to support. Now my question is given this what do people think is the best approach. Using DisplayFor to handle the rendering of the different types? Also how do I work this with the actions... I would imagine that I would use the one action, and pass in the layout types, but then how does this support the grouping required for the summary, grid and table views. Do i treat each grouping as just a layout type Also how would this work from a URL point of view - what do people think is the template to support this layout functionality Cheers Anthony

Read the article
Linked Measure Groups and Local Dimensions

- by ekoner

Mulling over something I've been reading up on. According to Chris Webb, A linked measure group can only be used with dimensions from the same database as the source measure group. So I took this to mean as long as two cubes share a database, a linked measure group can be used with a dimension. So I created a new cube and added a local measure group, a local dimension and a linked measure group. However, I can't create a relationship between the linked measure group and the local dimension even though they are within the same database. I get the message below: Regular relationships in the current database between non-linked (local) dimensions and linked measure groups cannot be edited. These relationship can only be created through the wizard. This dialog can be used to delete these relationships. I see that I can go to the original cube and add the dimension there, but does the message below mean I have an alternative? I just know it's going to be something simple and trivial! Thanks for reading.

Read the article
Starting to learn C#

- by cf_PhillipSenn

After watching the tier 1 video, where do I go to learn how to program in C#? Is there an online video series? Usually people say to "Read the Manual", but what is a manual these days? Do I just say "Help", "View Help" and start grokking?

Read the article
What are the hidden features of Maven2?

- by Xymor

What are the hidden features of Maven2?

Read the article
Javascript to disable button residing inside a user control

- by Lijo

Hi Team, I am using a "user control" which contains a button and other controls. I am using it in an aspx page. I want to diable the button using Javascript. By any chance, is it possible to achieve this? Thanks Lijo

Read the article
Writing direct to disk with php

- by Jurander

I would like to create an upload script that doesn't fall under the php upload limit. There might be an occasion where I need to upload a 2GB, or larger file and I don't want to have to change the whole server execution to above 32MB. Is there a way to write direct to disk from php? What method might you propose someone would use to accomplish this? I have read around stack overflow but haven't quite found what I am looking to do.

Read the article
! Extra }, or forgotten \endgroup. latex

- by gzou

hey, I met these latex format problem, anyone can offer some help? the .tex file: \begin{table}{} \renewcommand{\arraystretch}{1.1} \caption{Cambridge Flow feature definition and description} \label{cambridge-feature}} \centering \begin{tabular}{|c|c|} \hline\bfseries Abbreviation &\bfseries Description\\ \hline serv-port & Server port\\ \hline clnt-port & Client port\\ \hline push-pkts-serv & count of all packets with\\ & push bit set in TCP header (server to client)\\ \hline init-win-bytes-clnt & the total number of bytes \\ & sent in initial window (client to server)\\ \hline init-win-bytes-serv & the total number of bytes sent\\ & in initial window (server to client)\\ \hline avg-seg-size-clnt & average segment size: \\ & data bytes devided by number of packets\\ \hline IP-bytes-med-clnt & median of total bytes in IP packet\\ \hline act-data-pkt-serv & count of packet with at least one byte \\ & of TCP data playload (server to client)\\ \hline data-bytes-var-clnt & variance of total \\ & bytes in packets (client to server)\\ \hline min-seg-size-serv & minimum segment size \\ & observed (server to client)\\ \hline RTT-samples-serv & total number of RTT samples\\ & found (server to client),\\ & {\bf see also \cite{Moore05discriminators}}\\ \hline push-pkts-clnt & count of all packets with push bit set \\ & in TCP header (server to client)\\ \hline \end{tabular} \end{table} and the error message: ! Extra }, or forgotten \endgroup. \@endfloatbox ...pagefalse \outer@nobreak \egroup \color@endbox l.892 \end{table} I've deleted a group-closing symbol because it seems to be spurious, as in $x}$'. But perhaps the } is legitimate and you forgot something else, as in\hbox{$x}'. In such cases the way to recover is to insert both the forgotten and the deleted material, e.g., by typing `I$}'. there is no $ in my table, also this { are matching with the }, and also after I comment the citation, the error remains. anyone can offer help? really appreciate all the comments! ! Extra }, or forgotten \endgroup.

Read the article
What's the recommended way to write logs from a ruby script on Windows XP?

- by Horace Ho

Too bad I don't have a server but a XP to serve some scheduled ruby scripts. So, what's the recommended way (gem/plug-in?) to write logs on Windows XP from ruby scripts?

Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >