table design for storing large number of rows

Posted by hyperboreean on Stack Overflow See other posts from Stack Overflow or by hyperboreean
Published on 2010-05-10T10:12:13Z Indexed on 2010/05/10 10:14 UTC
Read the original article Hit count: 250

Filed under:
|

I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information:

  • the unique identifier which unfortunately it's text
  • the sites on which that unique identifier has been seen

The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of.

So I thought about the following designs of the table:

  • id - integer

    identifier - text

    seen_on_site - an integer, foreign key to a sites table

This approach would require around 22 mil multiplied by the number of sites.

  • id - integer

    identifier - text

    seen_on_site_1 - boolean

    seen_on_site_2 - boolean

    ............

    seen_on_site_n - boolean

Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective.

  • one table that would store only unique identifiers, like in:

id - integer

unique_identifier - text,

one table that would store only sites, like in:

id - integer

site - text

and one many to many relation, like:

id - integer,

unique_id - integer (fk to the table storing identifiers)

site_id - integer (fk to sites table)

  • another approach would be to have a table that stores unique identifiers for each site

So, which one seems like a better approach to take on the long run?

© Stack Overflow or respective owner

Related posts about database-design

Related posts about postgresql