Database design and performance impact

Posted by Craige on Programmers See other posts from Programmers or by Craige
Published on 2010-12-27T03:01:29Z Indexed on 2010/12/27 3:58 UTC
Read the original article Hit count: 443

I have a database design issue that I'm not quite sure how to approach, nor if the benefits out weigh the costs. I'm hoping some P.SE members can give some feedback on my suggested design, as well as any similar experiences they may have came across.

As it goes, I am building an application that has large reporting demands. Speed is an important issue, as there will be peak usages throughout the year.

This application/database has a multiple-level, many-to-many relationship.

eg

object a
object b
object c
object d

object b has relationship to object a
object c has relationship to object b, a
object d has relationship to object c, b, a

Theoretically, this could go on for unlimited levels, though logic dictates it could only go so far.

My idea here, to speed up reporting, would be to create a syndicate table that acts as a global many-to-many join table. In this table (with the given example), one might see:

+----------+-----------+---------+
| child_id | parent_id | type_id |
+----------+-----------+---------+
|    b     |     a     |    1    |
|    c     |     b     |    2    |
|    c     |     a     |    3    |
|    d     |     c     |    4    |
|    d     |     b     |    5    |
|    d     |     a     |    6    |
+----------+-----------+---------+

Where a, b, c and d would translate to their respective ID's in their respective tables. So, for ease of reporting all of a which exist on object d, one could query

SELECT * FROM `syndicates` ... JOINS TO child and parent tables ... WHERE parent_id=a and type_id=6;

rather than having a query with a join to each level up the chain.

The Problem

This table grows exponentially, and in a given year, could easily grow past 20,000 records for one client. Given multiple clients over multiple years, this table will VERY quickly explode to millions of records and beyond.

Now, the database will, in time, be partitioned across multiple servers, but I would like (as most would) to keep the number of servers as low as possible while still offering flexibility.

Also writes and updates would be exponentially longer (though possibly not noticeable to the end user) as there would be multiple inserts/updates/scans on this table to keep it in sync.

Am I going in the right direction here, or am I way off track. What would you do in a similar situation? This solution seems overly complex, but allows the greatest flexibility and fastest read-operations.

Sidenote 1 - This structure allows me to add new levels to the tree easily.

Sidenote 2 - The database querying for this database is done through an ORM framework.

© Programmers or respective owner

Related posts about best-practices

Related posts about database-development