on Stack Overflow
See other posts from Stack Overflow
or by philcolbourn
Published on 2010-04-10T00:37:45Z Indexed on 2010/04/10 0:43 UTC
Read the original article Hit count: 305
I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash
n values into a hash table with
n slots (or bins):
- Probability that a bin is empty, for large
- Expected number of empty bins is
- Probability that a bin has
- Probability that a bin has at least k collisions is
These look easy to check. But the
max-load test (the maximum number of collisions with high probability) is usually stated vaguely.
Most texts state that the maximum number of collisions in any bin is
O( ln(n) / ln(ln(n)) ).
Some say it is
3*ln(n) / ln(ln(n)). Other papers mix
log - usually without defining them, or state that
log is log base e and then use
ln the log to base
2 and is this
max-load formula right and how big should
n be to run a test?
This lecture seems to cover it best, but I am no mathematician.
with high probability seems to mean
1 - 1/n.
© Stack Overflow or respective owner