Should we denormalize database to improve performance?

Posted by Groo on Stack Overflow See other posts from Stack Overflow or by Groo
Published on 2010-05-03T11:10:02Z Indexed on 2010/05/03 17:58 UTC
Read the original article Hit count: 297

We have a requirement to store 500 measurements per second, coming from several devices. Each measurement consists of a timestamp, a quantity type, and several vector values. Right now there is 8 vector values per measurement, and we may consider this number to be constant for needs of our prototype project. We are using HNibernate. Tests are done in SQLite (disk file db, not in-memory), but production will probably be MsSQL.

Our Measurement entity class is the one that holds a single measurement, and looks like this:

public class Measurement
{
    public virtual Guid Id { get; private set; }
    public virtual Device Device { get; private set; }
    public virtual Timestamp Timestamp { get; private set; }
    public virtual IList<VectorValue> Vectors { get; private set; }
}

Vector values are stored in a separate table, so that each of them references its parent measurement through a foreign key.

We have done a couple of things to ensure that generated SQL is (reasonably) efficient: we are using Guid.Comb for generating IDs, we are flushing around 500 items in a single transaction, ADO.Net batch size is set to 100 (I think SQLIte does not support batch updates? But it might be useful later).

The problem

Right now we can insert 150-200 measurements per second (which is not fast enough, although this is SQLite we are talking about). Looking at the generated SQL, we can see that in a single transaction we insert (as expected):

  • 1 timestamp
  • 1 measurement
  • 8 vector values

which means that we are actually doing 10x more single table inserts: 1500-2000 per second.

If we placed everything (all 8 vector values and the timestamp) into the measurement table (adding 9 dedicated columns), it seems that we could increase our insert speed up to 10 times.

Switching to SQL server will improve performance, but we would like to know if there might be a way to avoid unnecessary performance costs related to the way database is organized right now.

[Edit]

With in-memory SQLite I get around 350 items/sec (3500 single table inserts), which I believe is about as good as it gets with NHibernate (taking this post for reference: http://ayende.com/Blog/archive/2009/08/22/nhibernate-perf-tricks.aspx).

But I might as well switch to SQL server and stop assuming things, right? I will update my post as soon as I test it.

© Stack Overflow or respective owner

Related posts about database-normalization

Related posts about database-design