Tag Cloud Data Backend

Posted by Waldron on Stack Overflow See other posts from Stack Overflow or by Waldron
Published on 2010-04-08T21:10:06Z Indexed on 2010/04/08 21:13 UTC
Read the original article Hit count: 214

Filed under:

I want to be able to generate tag clouds from free text that comes from any number of different sources. For clarity, I'm not talking about how to display a tag cloud once the critical tags/phrases are already discovered, I'm hoping to be able to discover the meaningful phrases themselves... preferable on a PHP/MySQL stack.

If I had to do this myself, I'd start by establishing some kind of index for words/phrases that gives a "normal" frequency for any word/phrase. eg "Constantinople" occurs once in every 1,000,000 words on average (normal frequency "0.000001"). Then as I analyze a body of text, I'd find the individual words/phrases (another challenge!), find frequencies of each within the input, and measure against the expected freqeuncy. Words that have the highest ratio against expected frequency get boosted priority in the cloud.

I'd like to believe someone else has already done this, WAY better than I could hope to, but I'll be damned if I can find it.

Any recommendations??

© Stack Overflow or respective owner

Related posts about tag-cloud