Scalable Database Tagging Schema

Posted by Longpoke on Stack Overflow See other posts from Stack Overflow or by Longpoke
Published on 2009-03-19T22:18:42Z Indexed on 2010/05/09 16:38 UTC
Read the original article Hit count: 228

Filed under:
|
|
|
|

EDIT: To people building tagging systems. Don't read this. It is not what you are looking for. I asked this when I wasn't aware that RDBMS all have their own optimization methods, just use a simple many to many scheme.

I have a posting system that has millions of posts. Each post can have an infinite number of tags associated with it.

Users can create tags which have notes, date created, owner, etc. A tag is almost like a post itself, because people can post notes about the tag.

Each tag association has an owner and date, so we can see who added the tag and when.

My question is how can I implement this? It has to be fast searching posts by tag, or tags by post. Also, users can add tags to posts by typing the name into a field, kind of like the google search bar, it has to fill in the rest of the tag name for you.

I have 3 solutions at the moment, but not sure which is the best, or if there is a better way.

Note that I'm not showing the layout of notes since it will be trivial once I get a proper solution for tags.

Method 1. Linked list

tagId in post points to a linked list in tag_assoc, the application must traverse the list until flink=0

post:           id, content, ownerId, date, tagId, notesId
tag_assoc:      id, tagId, ownerId, flink
tag:            id, name, notesId

Method 2. Denormalization

tags is simply a VARCHAR or TEXT field containing a tab delimited array of tagId:ownerId. It cannot be a fixed size.

post:           id, content, ownerId, date, tags, notesId
tag:            id, name, notesId

Method 3. Toxi

(from: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html, also same thing here: http://stackoverflow.com/questions/20856/how-do-you-recommend-implementing-tags-or-tagging)

post:          id, content, ownerId, date, notesId
tag_assoc:     ownerId, tagId, postId
tag:           id, name, notesId

Method 3 raises the question, how fast will it be to iterate through every single row in tag_assoc?

Methods 1 and 2 should be fast for returning tags by post, but for posts by tag, another lookup table must be made.

The last thing I have to worry about is optimizing searching tags by name, I have not worked that out yet.

I made an ASCII diagram here: http://pastebin.com/f1c4e0e53

© Stack Overflow or respective owner

Related posts about tags

Related posts about sql