reservoir sampling problem: correctness of proof

Posted by eSKay on Stack Overflow See other posts from Stack Overflow or by eSKay
Published on 2010-04-11T10:50:50Z Indexed on 2010/04/11 10:53 UTC
Read the original article Hit count: 947

Filed under:

algorithm

|

reservoir-sampling

|

probability

|

random-numbers

|

random

This MSDN article proves the correctness of Reservoir Sampling algorithm as follows:

Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k.
The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1.
So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1).
So, when k+1 = n, any element is present with probability s/n.

about step 3:

What are the k+1 rounds mentioned?
What is chosen in k steps, and not removed in k steps?
Why are we only calculating this probability for elements that were already in R after the first s steps?

© Stack Overflow or respective owner

Related posts about algorithm

Jpeg Algorithm vs BMP Algorithm?

as seen on Super User - Search for 'Super User'
I'm just wonder, what the differences are between creating a BMP file algorithm and JPG file algorithm ? If you know the others images' format algorithm, please post them. Thanks. >>> More
word disambiguation algorithm (Lesk algorithm)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database. For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the… >>> More
Search algorithm (with a sort algorithm already implemented)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, Im doing a Java application and Im facing some doubts in which concerns performance. I have a PriorityQueue which guarantees me the element removed is the one with greater priority. That PriorityQueue has instances of class Event (which implements Comparable interface). Each Event is associated… >>> More
Is there any algorithm for finding LINES by PIXEL COLORS on picture?

as seen on Stack Overflow - Search for 'Stack Overflow'
So I have Image like this I want to get something like this (I hevent drawn all lines I want but I hope you can get my idea) I need algorithm for finding all straight lines on it by just reading colors of pixels. No hard math, no Haar, no Hough. Some algorithm which would be based on points… >>> More
collsion issues with quadtree [on hold]

as seen on Game Development - Search for 'Game Development'
So i implemented a Quad tree in Java for my 2D game and everything works fine except for when i run my collision detection algorithm, which checks if a object has hit another object and which side it hit.My problem is 80% of the time the collision algorithm works but sometimes the objects just go… >>> More

Related posts about reservoir-sampling

reservoir sampling problem: correctness of proof

as seen on Stack Overflow - Search for 'Stack Overflow'
This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being… >>> More
reservoir sampling problem

as seen on Stack Overflow - Search for 'Stack Overflow'
This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being… >>> More
Reservoir sampling

as seen on Stack Overflow - Search for 'Stack Overflow'
to retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?? >>> More