sql: Group by x,y,z; return grouped by x,y with lowest f(z)

Posted by Sai Emrys on Stack Overflow See other posts from Stack Overflow or by Sai Emrys
Published on 2010-03-18T12:07:36Z Indexed on 2010/03/18 12:11 UTC
Read the original article Hit count: 476

Filed under:
|
|
|

This is for http://cssfingerprint.com

I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.)

The related tables look like this:

CREATE TABLE `browser_tests` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bogus` tinyint(1) DEFAULT NULL,
  `result` tinyint(1) DEFAULT NULL,
  `method` varchar(255) DEFAULT NULL,
  `url` varchar(255) DEFAULT NULL,
  `os` varchar(255) DEFAULT NULL,
  `browser` varchar(255) DEFAULT NULL,
  `version` varchar(255) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1

CREATE TABLE `method_timings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `method` varchar(255) DEFAULT NULL,
  `batch_size` int(11) DEFAULT NULL,
  `timing` int(11) DEFAULT NULL,
  `os` varchar(255) DEFAULT NULL,
  `browser` varchar(255) DEFAULT NULL,
  `version` varchar(255) DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1

(user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.)

I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple:

select c, avg(bogus) as bog, timing, method, browser, version 
  from browser_tests as b inner join (
    select count(*) as c, round(avg(timing)) as timing, method, 
     browser, version from method_timings 
    group by browser, version, method 
    having c > 10 order by browser, version, timing
  ) as t using (browser, version, method)
  group by browser, version, method 
  having bog < 1 
  order by browser, version, timing;

Which returns something like:

c  bog  tim  method       browser  version
88 0.8333 184 reuse_insert Chrome 4.0.249.89
18 0.0000 238 mass_insert_width Chrome 4.0.249.89
70 0.0400 246 mass_insert Chrome 4.0.249.89
70 0.0400 327 mass_noinsert Chrome 4.0.249.89
88 0.0556 367 reuse_reinsert Chrome 4.0.249.89
88 0.0556 383 jquery Chrome 4.0.249.89
88 0.0556 863 full_reinsert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
187 0.8806 109 reuse_insert Chrome 5.0.307.11
123 0.0000 110 mass_insert_width Chrome 5.0.307.11
176 0.0000 231 mass_noinsert Chrome 5.0.307.11
176 0.0000 237 mass_insert Chrome 5.0.307.11
187 0.0000 314 reuse_reinsert Chrome 5.0.307.11
187 0.0000 372 full_reinsert Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
12 0.2500 102 jquery Chrome 5.0.335.0
[...]

I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like:

88 0.8333 184 reuse_insert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
[...]

How can I do this, while still returning the method that goes with that lowest timing?

I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.

© Stack Overflow or respective owner

Related posts about sql

Related posts about mysql