Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving



Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving

Given an unbalanced binary tree, write code to select k sample node at random

straightforward solution would be to list all the nodes in any traversal order (BFS, DFS, etc) and find random k index from the n nodes in the list. But how this approach would behave if n is very large and k is very small? or n is unknown? When n is very large the probability of choosing an element from n , k/n is very very small number when n>>k. Usual method for generating (e.g. random()*(i+1) or rand()%(i+1)). is very skewed. The distribution of selection probability trends to have long negative tail. So, obviously you can't guarantee an uniform sampling.

Choosing uniformly distributed random sample from an unknown (or very large) polulatiom is an interesting problem. There is a very famous algorithm to solve this problem in O(n) using randomized algorithms. This is called Reservoir sampling.

The idea is to keep a reservoir of k elements and select an element from the population of size n with equal probability of k/n. How to do that?

  • Select first k (index i = 0 to k-1) elements and add to the reservoir
  • For rest of the elements, iterate for each element at index i = k to n, find a uniform random index 0<=j<=i
  • If j is less than k then replace the reservoir element at index j by current population element at index i.

Question is how the uniform selection probability is guaranteed? Let's do some math. An element can be in the final reservoir if either it was selected from index 0 to k-1 and was not replaced in subsequent iterations or it was selected from index k to n-1 and took place in reservoir by replacing an element at index 0 to k-1.

  • Case 1 (elements from index 0 to k-1) : probability that an element at index i=0 to k-1 will be in the reservoir = probability for choosing each of the random index selected for replacement during iterations k to n-1 without using index i element = (k/k+1)*(k+1/k+2)*…*((n-1)/n) = k/n
  • Case 2 (elements from index k to n-1) : probability that an element at index i=k to n-1 will be in the reservoir = (probability of choosing an element at index i=k to n-1) * (probability of choosing an element at index j =0 to k-1 to be replaced by element at i). Let's start from last element. probability of choosing last element at index i=n-1 = probability of choosing an element at index j=0 to k-1 = k/n. Let's consider second last term. Probability of choosing element at index i=n-2 = (probability that element at index j = 0 to k-1 was selected for replacement in previous iteration, i=n-2) * (probability that index j=0 to k-1 selected for replacement, in current iteration is not equal to j') = (k/(n-1))*((n-1)/n) = k/n

Now, we understand that we can use reservoir sampling to get uniformly random sample of size k from a population of size n. Let's get back to original problem. We need to find the samples for a binary tree. Without knowing the total number of elements we can find k random sample by keeping an index starting at index=0 and increment by one whenever we come across a tree node. Using such an index we can apply reservoir sampling while traversing the tree in any traversal order as follows in O(n) time –


Read full article from Uniform Random sampling of Unbalanced Binary Tree - Algorithms and Problem SolvingAlgorithms and Problem Solving


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts