Beginner SQL question: querying gold and silver tag badges in Stack Exchange Data Explorer

Posted by polygenelubricants on Stack Overflow See other posts from Stack Overflow or by polygenelubricants
Published on 2010-06-06T18:42:07Z Indexed on 2010/06/06 19:02 UTC
Read the original article Hit count: 374

Filed under:
|

I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.

I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:

  • Badges
    • Id
    • UserId
    • Name
    • Date

This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.

Here's an example query I wrote for [mysql] tag:

SELECT
  UserId as [User Link],
  Date
FROM
  Badges
Where
  Name = 'mysql'
Order By
  Date ASC

The (slightly annotated) output is: as seen on stexdex:

User Link       Date                    
--------------- -------------------     // all for silver except where noted
Bill Karwin     2009-02-20 11:00:25     
Quassnoi        2009-06-01 10:00:16     
Greg            2009-10-22 10:00:25     
Quassnoi        2009-10-31 10:00:24     // for gold
Bill Karwin     2009-11-23 11:00:30     // for gold
cletus          2010-01-01 11:00:23    
OMG Ponies      2010-01-03 11:00:48     
Pascal MARTIN   2010-02-17 11:00:29 
Mark Byers      2010-04-07 10:00:35     
Daniel Vassallo 2010-05-14 10:00:38 

This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.

So this is the way I understand it:

  • The first time an Id appears (in chronological order) is for the silver badge
  • The second time is for the gold

Now, the above result mixes the silver and gold entries together. My questions are:

  • Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
  • In the current design, how would you query the silver and gold badges separately?
    • GROUP BY Id and picking the min/max or first/second by the Date somehow?
    • How can you write a query that lists all the silver badges first then all the gold badges next?
      • Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
      • How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
    • Is it perhaps more typical to do two totally separate queries instead?
    • What is this idiom called? A row "partitioning" query to put them into "buckets" or something?

© Stack Overflow or respective owner

Related posts about sql

Related posts about beginner