denormalization - Developer IT

Denormalization database

- by Pedro Magalhaes

I was taking a look at SSB (Star Schema Benchmark -http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf) and then i was thinking if is possible to denormalize all tables from the SSB? So database size will increase a lot but potencially the performance will grow up. Is that right? Is It possible? Thanks and sorry for my poor english

Read the article

Denormalization of large text?

- by tesmar

If I have large articles that need to be stored in a database, each associated with many tables would a NoSQL option help? Should I copy the 1000 char articles over multiple "buckets", duplicating them each time they are related to a bucket or should I use a normalized MySQL DB with lots of Memcache?

Read the article

Denormalization Strategies

In building a database, typically we want a well normalized design. However there are cases for considering options for denormalization in complex systems. Timothy Claason gives you some thoughts on the subject.

Read the article

Predicting advantages of database denormalization

- by Janus Troelsen

I was always taught to strive for the highest Normal Form of database normalization, and we were taught Bernstein's Synthesis algorithm to achieve 3NF. This is all very well and it feels nice to normalize your database, knowing that fields can be modified while retaining consistency. However, performance may suffer. That's why I am wondering whether there is any way to predict the speedup/slowdown when denormalizing. That way, you can build your list of FD's featuring 3NF and then denormalize as little as possible. I imagine that denormalizing too much would waste space and time, because e.g. giant blobs are duplicated or it because harder to maintain consistency because you have to update multiple fields using a transaction. Summary: Given a 3NF FD set, and a set of queries, how do I predict the speedup/slowdown of denormalization? Link to papers appreciated too.

Read the article

How do you deal with denormalization / secondary indexes in database sharding?

- by Continuation

Say I have a "message" table with 2 secondary indexes: "recipient_id" "sender_id" I want to shard the "message" table by "recipient_id". That way to retrieve all messages sent to a certain recipient I only need to query one shard. But at the same time, I want to be able to make a query that ask for all messages sent by a certain sender. Now I don't want to send that query to every single shard of the "message" table. One way to do this is to duplicate the data and have a "message_by_sender" table sharded by "sender_id". The problem with that approach is that every time a message has been sent, I need to insert the message into both "message" and "message_by_sender" tables. But what if after inserting into "message" the insertion into "message_by_sender" fail? In that case the message exists in "message" but not in "message_by_sender". How do I make sure that if a message exists in "message" then it also exists in "message_by_sender" without resorting to 2 phase commit? This must be a very common issue for anyone who shards their databases. How do you deal woth it?

Read the article

Views performance in MySQL for denormalization

- by Gianluca Bargelli

I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly; In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table: CREATE VIEW `Users_Merged` ( name, surname, email, phone, role ) AS ( SELECT name, surname, email, phone, 'Customer' FROM `Customer` ) UNION ( SELECT name, surname, email, tel, 'Admin' FROM `Administrator` ) UNION ( SELECT name, surname, email, tel, 'Manager' FROM `manager` ); This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance. For example: SELECT * from `Users_Merged` WHERE role = 'Admin'; Is the right way to filter view's data or should i filter BEFORE creating the view itself? (I need this to have a list of users and the functionality to filter them by role). EDIT Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct? See Denormalization on wikipedia

Read the article

Data denormalization and C# objects DB serialization

- by Robert Koritnik

I'm using a DB table with various different entities. This means that I can't have an arbitrary number of fields in it to save all kinds of different entities. I want instead save just the most important fields (dates, reference IDs - kind of foreign key to various other tables, most important text fields etc.) and an additional text field where I'd like to store more complete object data. the most obvious solution would be to use XML strings and store those. The second most obvious choice would be JSON, that usually shorter and probably also faster to serialize/deserialize... And is probably also faster. But is it really? My objects also wouldn't need to be strictly serializable, because JsonSerializer is usually able to serialize anything. Even anonymous objects, that may as well be used here. What would be the most optimal solution to solve this problem? Additional info My DB is highly normalised and I'm using Entity Framework, but for the purpose of having external super-fast fulltext search functionality I'm sacrificing a bit DB denormalisation. Just for the info I'm using SphinxSE on top of MySql. Sphinx would return row IDs that I would use to fast query my index optimised conglomerate table to get most important data from it much much faster than querying multiple tables all over my DB. My table would have columns like: RowID (auto increment) EntityID (of the actual entity - but not directly related because this would have to point to different tables) EntityType (so I would be able to get the actual entity if needed) DateAdded (record timestamp when it's been added into this table) Title Metadata (serialized data related to particular entity type) This table would be indexed with SPHINX indexer. When I would search for data using this indexer I would provide a series of EntityIDs and a limit date. Indexer would have to return a very limited paged amount of RowIDs ordered by DateAdded (descending). I would then just join these RowIDs to my table and get relevant results. So this won't actually be full text search but a filtering search. Getting RowIDs would be very fast this way and getting results back from the table would be much faster than comparing EntityIDs and DateAdded comparisons even though they would be properly indexed.

Read the article

Denormalize for Simplicity: Ungood idea?

- by yar

After reading this question, I've learned that denormalization is not a solution for simplicity. What about this case? I have news-articles which have a list of sites-article-will-be-published-to. The latter can be expressed in normalized fashion either by table and a many-to-many relationship (via a cross-table, I think). But the simple solution is to just throw a bunch of booleans for the sites-article-will-be-published-to. Assuming the sites are: small in number will not change over time have no fields themselves, except a name Is this still a terrible idea? The many-to-many relationship seems somewhat cumbersome, but I've done it before in cases like this (and it seemed cumbersome). Note: I'm doing this in Rails, where it's not that painful.

Read the article

Map denormalized hibernate

- by Jurgen H

I have a Summary class which contains a list of Qualities. A Quality contains a String name and int value. This data is stored in a denormalized db structure, one table only, for both Summary and Quality. Quality table: id, somefileds, qualityname1, qualityvalue1, qualityname2, qualityvalue2, qualityname3, qualityvalue3 For each quality name & value pairs, a new Quality object must be inserted in the Summary class. How to map this in hibernate (xml hibernate mapping)?

Read the article

Table Design For SystemSettings, Best Model

- by Chris L

Someone suggested moving a table full of settings, where each column is a setting name(or type) and the rows are the customers & their respective settings for each setting. ID | IsAdmin | ImagePath ------------------------------ 12 | 1 | \path\to\images 34 | 0 | \path\to\images The downside to this is every time we want a new setting name(or type) we alter the table(via sql) and add the new (column)setting name/type. Then update the rows(so that each customer now has a value for that setting). The new table design proposal. The proposal is to have a column for setting name and another column for setting. ID | SettingName | SettingValue ---------------------------- 12 | IsAdmin | 1 12 | ImagePath | \path\to\images 34 | IsAdmin | 0 34 | ImagePath | \path\to\images The point they made was that adding a new setting was as easy as a simple insert statement to the row, no added column. But something doesn't feel right about the second design, it looks bad, but I can't come up with any arguments against it. Am I wrong?

Read the article

Normalize or Denormalize in high traffic websites

- by Inam Jameel

what is the best practice for database design for high traffic websites like this one stackoverflow? should one must use normalize database for record keeping or normalized technique or combination of both? is it sensible to design normalize database as main database for record keeping to reduce redundancy and at the same time maintain another denormalized form of database for fast searching? or main database should be denormalize and one can make normalized views in the application level for fast database operations? or beside above mentioned approach? what is the best practice of designing high traffic websites???

Read the article

SQL to join to the best matching row

- by williamjones

I have a wiki system where there is a central table, Article, that has many Revisions in their own table. The Revisions each contain a created_at time and date column. I want to update the Articles to contain a denormalized field sort_name from the most recent Revision's name field. What SQL command can I issue to fill in each Article's sort_name field with its most recent Revision's name field? For what it's worth, I'm on PostgreSQL.

Read the article

De-normalization alternative to specific MYSQL problem?

- by Booker

I am facing quite a specific optimization problem. I currently have 4 normalized tables of data. Every second, possibly thousands of users will pull down up-to-date info from these tables using AJAX. The thing is that I can predict relatively easily which subset of data they need... The most recent 100 or so entries in those 4 normalized tables. I have been researching de-normalization... but feel that perhaps there is an easier solution. I was thinking that I could somehow every second run one sql query to condense the needed info, store it in a temp cached table and then have all of the user queries just draw from this. This will allow the complex join of 4 tables to only be run once, and then from there the users just need to do a simple lookup from the cached table. I really don't know if this is feasible. Comments on this or any other suggestions would be much appreciated. Thanks!

Read the article

Normalizing Item Names & Synonyms

- by RabidFire

Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store. My current database schema is as follows: item_names: id | name | description | picture | common(BOOL) items: id | item_name_id | picture | price | description | picture item_synonyms: id | item_name_id | name | error(BOOL) Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza") I think the bright side of this schema is: Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10") Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson") The down side would be: Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both. Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this: items: id | name | picture | price | description | picture (... with item_names and item_synonyms as utility tables that I could query) Is there a better schema you would suggested? Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries? Is the first schema or the second better/optimal for search? Thanks in advance! References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT

Read the article

A good approach to db planing for reporting service

- by Itay Moav

The scenario: Big system (~200 tables). 60,000 users. Complex reports that will require me to do multiple queries for each report and even those will be complex queries with inner queries all over the place + some processing in PHP. I have seen an approach, which I am not sure about: Having one centralized, de-normalized, table that registers any activity in the system which is reportable. This table will hold mostly foreign keys, so she should be fairly compact and fast. So, for example (My system is a virtual learning management system), A user enrolls to course, the table stores the user id, date, course id, organization id, activity type (enrollment). Of course I also store this data in a normalized DB, which the actual application uses. Pros I see: easy, maintainable queries and code to process data and fast retrieval. Cons: there is a danger of the de-normalized table to be out of sync with the real DB. Is this approach worth considering, or (preferably from experience) is total $#%#%t?

Read the article

Table clusters in SQLServer

- by Bruno Martinez

In Oracle, a table cluster is a group of tables that share common columns and store related data in the same blocks. When tables are clustered, a single data block can contain rows from multiple tables. For example, a block can store rows from both the employees and departments tables rather than from only a single table: http://download.oracle.com/docs/cd/E11882_01/server.112/e10713/tablecls.htm#i25478 Can this be done in SQLServer?

Read the article

Does normalization really hurt performance in high traffic sites?

- by Luke101

I am designing a database and I would like to normalize the database. I one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about 2 tables. I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future i may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?

Read the article

Should I denormalize a has_many has_many?

- by Cameron

I have this: class User < ActiveRecord::Base has_many :serials has_many :sites, :through => :series end class Serial < ActiveRecord::Base belongs_to :user belongs_to :site has_many :episodes end class Site < ActiveRecord::Base has_many :serials has_many :users, :through => :serials end class Episode < ActiveRecord::Base belongs_to :serial end I would like to do some operations on User.serials.episodes but I know this would mean all sorts of clever tricks. I could in theory just put all the episode data into serial (denormalize) and then group_by Site when needed. If I have a lot of episodes that I need to query on would this be a bad idea? thanks

Read the article

Survey Data Model - How to avoid EAV and excessive denormalization?

- by AlexDPC

Hi everyone, My database skills are mediocre at best and I have to design a data model for survey data. I have spent some thoughts on this and right now I feel that I am stuck between some kind of EAV model and a design involving hundreds of tables, each with hundreds of columns (and thousands of records). There must be a better way to do this and I hope that the wise folks on this forum can help me. I have already searched various forums, but I couldn't really find a solution. If it has already been given elsewhere, please excuse me and provide me with a link so I can read it up. Some assumptions about the data I have to deal with: Each survey consists of 1 to n questionnaires Each questionnaire consists of 100-2,000 questions (please ignore that 2,000 questions really sound like a lot to answer...) Questions can be of various types: multiple-choice, free text, a number (like age, income, percentages, ...) Each survey involves 10-200 countries (These are not the respondents. The respondents are actually people in the countries.) Depending on the type of questionnaire, each questionnaire is answered by 100-20,000 respondents per country. A country can adapt the questionnaires for a survey, i.e. add, remove or edit questions The data for one country is gathered in a separate database in that country. There is no possibility for online integration from the start. The data for all countries has to be integrated later. This means for example, if a country has deleted a question, that data must somehow be derived from what they sent in order to achieve a uniform design across all countries I will have to write the integration and cleaning software, which will need to work with every country's data In the end the data needs to be exported to flat files, one rectangular grid per country and questionnaire. I have already discussed this topic with people from various backgrounds and have not come to a good solution yet. I mainly got two kinds of opinions. The domain experts, who are used to working with flat files (spreadsheet-style) for data processing and analysis vote for a denormalized structure with loads of tables and columns as I described above (1 table per country and questionnaire). This sounds terrible to me, because I learned that wide tables are to be avoided, it will be annoying to determine which columns are actually in a table when working with it, the database will become cluttered with hundreds of tables (or I even need to set up multiple databases, each with a similar yet a bit differetn design), etc. O-O-programmers vote for a strongly "normalized" design, which would effectively lead to a central table containing all the answers from all respondents to all questions. This table would either need to contain a column of type sql_variant type or multiple answer columns with different types to store answers of different types (multiple choice, free text, ..). The former would essentially be a EAV model. I tend to follow Joe Celko here, who strongly discourages its use (he calls it OTLT or "One True Lookup Table"). The latter would imply that each row would contain null cells for the not applicable types by design. Another alternative I could think of would be to create one table per answer type, i.e., one for multiple-choice questions, one for free text questions, etc.. That's not so generic, it would lead to a lot of union joins, I think and I would have to add a table if a new answer type is invented. Sorry for boring you with all this text and thank you for your input! Cheers, Alex PS: I asked the same question here: http://www.eggheadcafe.com/community/aspnet/13/10242616/survey-data-model--how-to-avoid-eav-and-excessive-denormalization.aspx

Read the article

How the number of indexes built on a table can impact performances?

- by Davide Mauri

We all know that putting too many indexes (I’m talking of non-clustered index only, of course) on table may produce performance problems due to the overhead that each index bring to all insert/update/delete operations on that table. But how much? I mean, we all agree – I think – that, generally speaking, having many indexes on a table is “bad”. But how bad it can be? How much the performance will degrade? And on a concurrent system how much this situation can also hurts SELECT performances? If SQL Server take more time to update a row on a table due to the amount of indexes it also has to update, this also means that locks will be held for more time, slowing down the perceived performance of all queries involved. I was quite curious to measure this, also because when teaching it’s by far more impressive and effective to show to attended a chart with the measured impact, so that they can really “feel” what it means! To do the tests, I’ve create a script that creates a table (that has a clustered index on the primary key which is an identity column) , loads 1000 rows into the table (inserting 1000 row using only one insert, instead of issuing 1000 insert of one row, in order to minimize the overhead needed to handle the transaction, that would have otherwise ), and measures the time taken to do it. The process is then repeated 16 times, each time adding a new index on the table, using columns from table in a round-robin fashion. Test are done against different row sizes, so that it’s possible to check if performance changes depending on row size. The result are interesting, although expected. This is the chart showing how much time it takes to insert 1000 on a table that has from 0 to 16 non-clustered indexes. Each test has been run 20 times in order to have an average value. The value has been cleaned from outliers value due to unpredictable performance fluctuations due to machine activity. The test shows that in a table with a row size of 80 bytes, 1000 rows can be inserted in 9,05 msec if no indexes are present on the table, and the value grows up to 88 (!!!) msec when you have 16 indexes on it This means a impact on performance of 975%. That’s *huge*! Now, what happens if we have a bigger row size? Say that we have a table with a row size of 1520 byte. Here’s the data, from 0 to 16 indexes on that table: In this case we need near 22 msec to insert 1000 in a table with no indexes, but we need more that 500msec if the table has 16 active indexes! Now we’re talking of a 2410% impact on performance! Now we can have a tangible idea of what’s the impact of having (too?) many indexes on a table and also how the size of a row also impact performances. That’s why the golden rule of OLTP databases “few indexes, but good” is so true! (And in fact last week I saw a database with tables with 1700bytes row size and 23 (!!!) indexes on them!) This also means that a too heavy denormalization is really not a good idea (we’re always talking about OLTP systems, keep it in mind), since the performance get worse with the increase of the row size. So, be careful out there, and keep in mind the “equilibrium” is the key world of a database professional: equilibrium between read and write performance, between normalization and denormalization, between to few and too may indexes. PS Tests are done on a VMWare Workstation 7 VM with 2 CPU and 4 GB of Memory. Host machine is a Dell Precsioni M6500 with i7 Extreme X920 Quad-Core HT 2.0Ghz and 16Gb of RAM. Database is stored on a SSD Intel X-25E Drive, Simple Recovery Model, running on SQL Server 2008 R2. If you also want to to tests on your own, you can download the test script here: Open TestIndexPerformance.sql

Read the article

Is it necessary to create a database with as few tables as possible

- by Shaheer

Should we create a database structure with a minimum number of tables? Should it be designed in a way that everything stays in one place or is it okay to have more tables? Will it in anyway affect anything? I am asking this question because a friend of mine modified some database structure in mediaWiki. In the end, instead of 20 tables he was using only 8, and it took him 8 months to do that (it was his college assignment). EDIT I am concluding the answer as: size of the tables does NOT matter, until the case is exceptional; in which case the denormalization may help. Thanks to everyone for the answers.

Read the article

I want a trivial example of where MongoDB can scale but a relational database will have trouble

- by Ryan Weir

I'm just learning to use MongoDB, and when discussing with other programmers would like a quick example of why NoSQL can be a good choice compared to a traditional RDBMS - however the scenarios I come up with and can find online seem pretty contrived. E.g. a blog with lots of traffic could be represented relationally, but will require some performance tuning and joins across tables (assuming full denormalization is being used). Whereas MongoDB would allow direct retrieval from one collection to the same effect. But the response I'm getting from other programmers is "why not just keep it relational and then add some trivial caching later?" Does anybody have a less contrived example where MongoDB will really shine and a relational db will fall over much quicker? The smaller the project/system the better, because it leaves less room for disagreement. Something along the lines of the complexity of the blog example would be really useful. Thanks.

Read the article

Building a Data Warehouse

- by Paul

I've seen tutorials articles and posts on how to build datawarehouses with star and snowflakes schemas, denormalization of OLTP databases fact and dimension tables and so on. Also seen comments like: Star schemas are for datamarts, at best. There is absolutely no way a true enterprise data warehouse could be represented in a star schema, or snowflake either. I want to create a database that will server for reporting services and maybe (if that isn't enough) install analisys services and extract reports and data from cubes. My question was : Is it really necesarry to redesign my current database and follow the star/snowflake schemas with fact and dimension tables ? Thank you

Read the article

Consolidate loan, purchase & sale tables into one transaction table.

- by Frank Computer

INFORMIX-SE with ISQL 7.3: I have separate tables for Loan, Purchase & Sales transactions. Each tables rows are joined to their respective customer rows by: customer.id [serial] = loan.foreign_id [integer]; = purchase.foreign_id [integer]; = sale.foreign_id [integer]; I would like to consolidate the three tables into one table called "transaction", where a column: transaction.trx_type char(1) {L=Loan, P=Purchase, S=Sale} identifies the transaction type. Each transaction will be assigned a unique transaction number [serial]. Is this a good idea or is it better to keep them in separate tables? Storage space is not a concern, I think it would be easier programming & user-wise to have all types of transactions under one table, whenever possible. This implies denormalization.

Read the article

MySQL triggers cannot update rows in same table the trigger is assigned to. Suggested workaround?

- by Cory House

MySQL doesn't currently support updating rows in the same table the trigger is assigned to since the call could become recursive. Does anyone have suggestions on a good workaround/alternative? Right now my plan is to call a stored procedure that performs the logic I really wanted in a trigger, but I'd love to hear how others have gotten around this limitation. Edit: A little more background as requested. I have a table that stores product attribute assignments. When a new parent product record is inserted, I'd like the trigger to perform a corresponding insert in the same table for each child record. This denormalization is necessary for performance. MySQL doesn't support this and throws: Can't update table 'mytable' in stored function/trigger because it is already used by statement which invoked this stored function/trigger. A long discussion on the issue on the MySQL forums basically lead to: Use a stored proc, which is what I went with for now. Thanks in advance!

Search Results

Search found 30 results on 2 pages for 'denormalization'.

Page 1/2 | 1 2 | Next Page >

- by Pedro Magalhaes

- by tesmar

- by Janus Troelsen

- by Continuation

- by Gianluca Bargelli

- by Robert Koritnik

- by yar

- by Jurgen H

- by Chris L

- by Inam Jameel

- by williamjones

- by Booker

- by RabidFire

- by Itay Moav

- by Bruno Martinez

- by Luke101

- by Cameron

- by AlexDPC

- by Davide Mauri

- by Shaheer

- by Ryan Weir

- by Paul

- by Frank Computer

- by Cory House

1 2 | Next Page >