Are there any working implementations of the rolling hash function used in the Rabin-Karp string sea
- by c14ppy
I'm looking to use a rolling hash function so I can take hashes of n-grams of a very large string.
For example:
"stackoverflow", broken up into 5 grams would be:
  "stack", "tacko", "ackov", "ckove",
  "kover", "overf", "verfl", "erflo", "rflow"
This is ideal for a rolling hash function because after I calculate the first n-gram hash, the following ones are relatively cheap to calculate because I simply have to drop the first letter of the first hash and add the new last letter of the second hash.
I know that in general this hash function is generated as:
H = c1ak - 1 + c2ak - 2 + c3ak - 3 + ... + cka0 where a is a constant and c1,...,ck are the input characters.
If you follow this link on the Rabin-Karp string search algorithm , it states that "a" is usually some large prime.
I want my hashes to be stored in 32 bit integers, so how large of a prime should "a" be, such that I don't overflow my integer?
Does there exist an existing implementation of this hash function somewhere that I could already use?
Here is an implementation I created:
public class hash2
{
    public int prime = 101;
    public int hash(String text)
    {
        int hash = 0;
        for(int i = 0; i < text.length(); i++)
        {
            char c = text.charAt(i);
            hash += c * (int) (Math.pow(prime, text.length() - 1 - i));
        }
        return hash;
    }
    public int rollHash(int previousHash, String previousText, String currentText)
    {
        char firstChar = previousText.charAt(0);
        char lastChar = currentText.charAt(currentText.length() - 1);
        int firstCharHash = firstChar * (int) (Math.pow(prime, previousText.length() - 1));
        int hash = (previousHash - firstCharHash) * prime + lastChar;
        return hash;
    }
    public static void main(String[] args)
    {
        hash2 hashify = new hash2();
        int firstHash = hashify.hash("mydog");
        System.out.println(firstHash);
        System.out.println(hashify.hash("ydogr"));
        System.out.println(hashify.rollHash(firstHash, "mydog", "ydogr"));
    }
}
I'm using 101 as my prime.  Does it matter if my hashes will overflow?  I think this is desirable but I'm not sure.
Does this seem like the right way to go about this?