# How can I test that my hash function is good in terms of max-load?

Posted
by philcolbourn
on Stack Overflow
See other posts from Stack Overflow
or by philcolbourn

Published on 2010-04-10T00:37:45Z
Indexed on
2010/04/10
0:43 UTC

Read the original article
Hit count: 305

I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash `n`

values into a hash table with `n`

slots (or bins):

- Probability that a bin is empty, for large
`n`

is`1/e`

. - Expected number of empty bins is
`n/e`

. - Probability that a bin has
`k`

collisions is`<= 1/k!`

. - Probability that a bin has at least k collisions is
`<= (e/k)**k`

.

These look easy to check. But the `max-load`

test (the maximum number of collisions with high probability) is usually stated vaguely.

Most texts state that the maximum number of collisions in any bin is `O( ln(n) / ln(ln(n)) )`

.
Some say it is `3*ln(n) / ln(ln(n))`

. Other papers mix `ln`

and `log`

- usually without defining them, or state that `log`

is log base e and then use `ln`

elsewhere.

Is `ln`

the log to base `e`

or `2`

and is this `max-load`

formula right and how big should `n`

be to run a test?

This lecture seems to cover it best, but I am no mathematician.

http://pages.cs.wisc.edu/~shuchi/courses/787-F07/scribe-notes/lecture07.pdf

BTW, `with high probability`

seems to mean `1 - 1/n`

.

© Stack Overflow or respective owner