reservoir sampling problem: correctness of proof

Posted by eSKay on Stack Overflow See other posts from Stack Overflow or by eSKay
Published on 2010-04-11T10:50:50Z Indexed on 2010/04/11 10:53 UTC
Read the original article Hit count: 606

This MSDN article proves the correctness of Reservoir Sampling algorithm as follows:


  1. Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k.

  2. The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1.

  3. So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1).

  4. So, when k+1 = n, any element is present with probability s/n.


about step 3:

  • What are the k+1 rounds mentioned?

  • What is chosen in k steps, and not removed in k steps?

  • Why are we only calculating this probability for elements that were already in R after the first s steps?

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about reservoir-sampling