Reservoir Sampling | Random Thoughts
Get a random sample of n records from a stream of N records, where N is unknown beforehand. The probability of selecting a record should be uniform over all the records in the stream.
Solution:
A number of solutions are explained here. The algorithm treats each item from the stream with equal probability even if the same item is already present in the reservoir. However, here I present only one algorithm, the last algorithm in the paper, which has been simplified over the years.
Algorithm:
- sample the first n items
- choose to sample the item with probability , where (N > i > n)
- if chosen, randomly replace with a previously sampled item
Read full article from Reservoir Sampling | Random Thoughts
No comments:
Post a Comment