# reservoir sampling problem: correctness of proof

Posted
by eSKay
on Stack Overflow
See other posts from Stack Overflow
or by eSKay

Published on 2010-04-11T10:50:50Z
Indexed on
2010/04/11
10:53 UTC

Read the original article
Hit count: 606

**This MSDN article** proves the correctness of **Reservoir Sampling algorithm** as follows:

Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k.

The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1.

So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1).

So, when k+1 = n, any element is present with probability s/n.

about step 3:

What are the

`k+1 rounds`

mentioned?What is

`chosen in k steps, and not removed in k steps`

?Why are we only calculating this probability for elements that were already in

`R`

after the first`s`

steps?

© Stack Overflow or respective owner