Is my understanding of "select distinct" correct?

Posted by paxdiablo on Stack Overflow See other posts from Stack Overflow or by paxdiablo
Published on 2010-05-03T07:53:36Z Indexed on 2010/05/03 7:58 UTC
Read the original article Hit count: 423

Filed under:

performance-tuning

We recently discovered a performance problem with one of our systems and I think I have the fix but I'm not certain my understanding is correct.

In simplest form, we have a table blah into which we accumulate various values based on a key field. The basic form is:

recdate   date
rectime   time
system    varchar(20)
count     integer
accum1    integer
accum2    integer

There are a lot more accumulators than that but they're all of the same form. The primary key is made up of recdate, rectime and system.

As values are collected to the table, the count for a given recdate/rectime/system is incremented and the values for that key are added to the accumulators. That means the averages can be obtained by using accumN / count.

Now we also have a view over that table specified as follows:

create view blah_v (
    recdate, rectime, system, count,
    accum1,
    accum2
) as select distinct
    recdate, rectime, system, count,
    value (case when count > 0 then accum1 / count end, 0),
    value (case when count > 0 then accum2 / count end, 0)
    from blah;

In other words, the view gives us the average value of the accumulators rather than the sums. It also makes sure we don't get a divide-by-zero in those cases where the count is zero (these records do exist and we are not allowed to remove them so don't bother telling me they're rubbish - you're preaching to the choir).

We've noticed that the time difference between doing:

select distinct recdate from XX

varies greatly depending on whether we use the table or the view. I'm talking about the difference being 1 second for the table and 27 seconds for the view (with 100K rows).

We actually tracked it back to the select distinct. What seems to be happening is that the DBMS is actually loading all the rows in and sorting them so as to remove duplicates. That's fair enough, it's what we stupidly told it to do.

But I'm pretty sure the fact that the view includes every component of the primary key means that it's impossible to have duplicates anyway. We've validated the problem since, if we create another view without the distinct, it performs at the same speed as the underlying table.

I just wanted to confirm my understanding that a select distinct can not have duplicates if it includes all the primary key components. If that's so, then we can simply change the view appropriately.

Developer IT

Is my understanding of "select distinct" correct? - Developer IT

Is my understanding of "select distinct" correct?

sql

db2

select

distinct

performance-tuning

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about db2

link DB2 AS/400 (system i) and DB2 AIX

Speed-start your Linux App: Using DB2 and the DB2 Control Center

DB2 LUW Security - The DB2 DBA as a Locksmith

Why can't Perl's DBD::DB2 find dbivport.h during installation?

DBI::DBD package not getting installed for Perl?

Categories cloud