Storing n-grams in database in < n number of tables.
        Posted  
        
            by kurige
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by kurige
        
        
        
        Published on 2010-05-17T01:17:02Z
        Indexed on 
            2010/05/17
            1:20 UTC
        
        
        Read the original article
        Hit count: 348
        
If I was writing a piece of software that attempted to predict what word a user was going to type next using the two previous words the user had typed, I would create two tables.
Like so:
== 1-gram table ==
Token | NextWord | Frequency
------+----------+-----------
"I"   | "like"   | 15
"I"   | "hate"   | 20
== 2-gram table ==
Token    | NextWord   | Frequency
---------+------------+-----------
"I like" | "apples"   | 8
"I like" | "tomatoes" | 12
"I hate" | "tomatoes" | 20
"I hate" | "apples"   | 2
Following this example implimentation the user types "I" and the software, using the above database, predicts that the next word the user is going to type is "hate". If the user does type "hate" then the software will then predict that the next word the user is going to type is "tomatoes".
However, this implimentation would require a table for each additional n-gram that I choose to take into account. If I decided that I wanted to take the 5 or 6 preceding words into account when predicting the next word, then I would need 5-6 tables, and an exponentially increase in space per n-gram.
What would be the best way to represent this in only one or two tables, that has no upper-limit on the number of n-grams I can support?
© Stack Overflow or respective owner