PostgreSQL - fetch the row which has the Max value for a column

Posted by Joshua Berry on Stack Overflow See other posts from Stack Overflow or by Joshua Berry
Published on 2009-02-25T16:37:23Z Indexed on 2010/06/15 6:52 UTC
Read the original article Hit count: 287

Filed under:

I'm dealing with a Postgres table (called "lives") that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id

There are multiple users (distinct usr_id's)
time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp.
trans_id is unique only for very small time ranges: over time it repeats
remaining_lives (for a given user) can both increase and decrease over time

example:

time_stamp|lives_remaining|usr_id|trans_id
-----------------------------------------
  07:00  |       1       |   1  |   1    
  09:00  |       4       |   2  |   2    
  10:00  |       2       |   3  |   3    
  10:00  |       1       |   2  |   4    
  11:00  |       4       |   1  |   5    
  11:00  |       3       |   1  |   6    
  13:00  |       3       |   3  |   1

As I will need to access other columns of the row with the latest data for each given usr_id, I need a query that gives a result like this:

time_stamp|lives_remaining|usr_id|trans_id
-----------------------------------------
  11:00  |       3       |   1  |   6    
  10:00  |       1       |   2  |   4    
  13:00  |       3       |   3  |   1

As mentioned, each usr_id can gain or lose lives, and sometimes these timestamped events occur so close together that they have the same timestamp! Therefore this query won't work:

SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM 
      (SELECT usr_id, max(time_stamp) AS max_timestamp 
       FROM lives GROUP BY usr_id ORDER BY usr_id) a 
JOIN lives b ON a.max_timestamp = b.time_stamp

Instead, I need to use both time_stamp (first) and trans_id (second) to identify the correct row. I also then need to pass that information from the subquery to the main query that will provide the data for the other columns of the appropriate rows. This is the hacked up query that I've gotten to work:

SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM 
      (SELECT usr_id, max(time_stamp || '*' || trans_id) 
       AS max_timestamp_transid
       FROM lives GROUP BY usr_id ORDER BY usr_id) a 
JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id 
ORDER BY b.usr_id

Okay, so this works, but I don't like it. It requires a query within a query, a self join, and it seems to me that it could be much simpler by grabbing the row that MAX found to have the largest timestamp and trans_id. The table "lives" has tens of millions of rows to parse, so I'd like this query to be as fast and efficient as possible. I'm new to RDBM and Postgres in particular, so I know that I need to make effective use of the proper indexes. I'm a bit lost on how to optimize.

I found a similar discussion here. Can I perform some type of Postgres equivalent to an Oracle analytic function?

Any advice on accessing related column information used by an aggregate function (like MAX), creating indexes, and creating better queries would be much appreciated!

P.S. You can use the following to create my example case:

create TABLE lives (time_stamp timestamp, lives_remaining integer, usr_id integer, trans_id integer); insert into lives values ('2000-01-01 07:00', 1, 1, 1); insert into lives values ('2000-01-01 09:00', 4, 2, 2); insert into lives values ('2000-01-01 10:00', 2, 3, 3); insert into lives values ('2000-01-01 10:00', 1, 2, 4); insert into lives values ('2000-01-01 11:00', 4, 1, 5); insert into lives values ('2000-01-01 11:00', 3, 1, 6); insert into lives values ('2000-01-01 13:00', 3, 3, 1);

Developer IT

PostgreSQL - fetch the row which has the Max value for a column - Developer IT

PostgreSQL - fetch the row which has the Max value for a column

sql

postgresql

query-optimization

cbo

cost-based-optimizer

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about postgresql

Postgresql fails to start on Ubuntu 10.04.4 LTS

can't install psycopg2 in my env on mac os x lion

Postgresql has broken apt-get on Ubuntu

Installing PostgreSQL on FreeBSD (with ports)

Strange permission errors in new PostgreSQL installation

Categories cloud