All About Programming: Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving

Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving

Given an unbalanced binary tree, write code to select k sample node at random

straightforward solution would be to list all the nodes in any traversal order (BFS, DFS, etc) and find random k index from the n nodes in the list. But how this approach would behave if n is very large and k is very small? or n is unknown? When n is very large the probability of choosing an element from n , k/n is very very small number when n>>k. Usual method for generating (e.g. random()*(i+1) or rand()%(i+1)). is very skewed. The distribution of selection probability trends to have long negative tail. So, obviously you can't guarantee an uniform sampling.

Choosing uniformly distributed random sample from an unknown (or very large) polulatiom is an interesting problem. There is a very famous algorithm to solve this problem in O(n) using randomized algorithms. This is called Reservoir sampling.

The idea is to keep a reservoir of k elements and select an element from the population of size n with equal probability of k/n. How to do that?

Select first k (index i = 0 to k-1) elements and add to the reservoir
For rest of the elements, iterate for each element at index i = k to n, find a uniform random index 0<=j<=i
If j is less than k then replace the reservoir element at index j by current population element at index i.

Question is how the uniform selection probability is guaranteed? Let's do some math. An element can be in the final reservoir if either it was selected from index 0 to k-1 and was not replaced in subsequent iterations or it was selected from index k to n-1 and took place in reservoir by replacing an element at index 0 to k-1.

Case 1 (elements from index 0 to k-1) : probability that an element at index i=0 to k-1 will be in the reservoir = probability for choosing each of the random index selected for replacement during iterations k to n-1 without using index i element = (k/k+1)*(k+1/k+2)*…*((n-1)/n) = k/n
Case 2 (elements from index k to n-1) : probability that an element at index i=k to n-1 will be in the reservoir = (probability of choosing an element at index i=k to n-1) * (probability of choosing an element at index j =0 to k-1 to be replaced by element at i). Let's start from last element. probability of choosing last element at index i=n-1 = probability of choosing an element at index j=0 to k-1 = k/n. Let's consider second last term. Probability of choosing element at index i=n-2 = (probability that element at index j = 0 to k-1 was selected for replacement in previous iteration, i=n-2) * (probability that index j=0 to k-1 selected for replacement, in current iteration is not equal to j') = (k/(n-1))*((n-1)/n) = k/n

Now, we understand that we can use reservoir sampling to get uniformly random sample of size k from a population of size n. Let's get back to original problem. We need to find the samples for a binary tree. Without knowing the total number of elements we can find k random sample by keeping an index starting at index=0 and increment by one whenever we come across a tree node. Using such an index we can apply reservoir sampling while traversing the tree in any traversal order as follows in O(n) time –

Read full article from Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving

Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving

No comments:

Post a Comment

Labels

Popular Posts