- by eSKay
This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1. So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1). So, when k+1 = n, any element is present with probability s/n. about step 3: What are the k+1 rounds mentioned? What is chosen in k steps, and not removed in k steps? Why are we only calculating this probability for elements that were already in R after the first s steps?