1. Given a multi-set of numbers (integers), determine if there is a majority element. E.g. S = {4, 1, 3, 4, 4, 3, 3, 4, 4, 1, 4} has 4 in majority S = {4, 4, 4, 3, 1, 1, 1, 3, 2, 2, 2} has no majority In a sequence x1, x2, ..., xn, define the multiplicity of x to be its frequency. A number is in majority if its multiplicity is greater than n/2. (For example, imagine each xi represents a vote; ints are ids of candidates.) One simple solution is to sort; all equal number are consecutive, so can count. Takes O(n log n) time. A more complicated solution uses median---if there is a majority, it must be the median. It can be found in O(n) time, but complex. A much better, simpler linear time solution exists. Observation: Pick any two elements xi and xj. If xi \neq xj, then eliminate BOTH xi and xj from the list. Then, the majority in the original list is still a majority in the reduced list. ALGORITHM: Use two variables C (candidate) and M (multiplicity). Scan elements in their given order. When considering xi, inductively, C is the only candidate for majority among {x1, x2, ..., x_i-1} and M is its count excluding the times C was eliminated. When xi is considered, if C is zero (before comparison), put xi = C and M = 1. else if xi = C, increment M; if xi \neq C, decrement M; At the end, C is the only possible candidate for majority. We make a second pass to "confirm" that. O(n) time. 2. Extension to K elements, each appearing at least n/(k+1) times. Maintain k candidates and counters. When considering xi, if xi equals one of the cands, increment it; if xi different and some counter available, set xi as that cand, and M = 1; if xi different from all, and all cands full; decrement them all. 3. A hot application problem in Data Stream context---heavy hitters in routers, top k popular items, etc... One important attribute of this scheme is that it uses O(1) memory---independent of the stream size. ------------------------------------------------------------------ 4. Celebrity Problem. Among n people, a celebrity is defined as someone who is known to everyone, but who knows no one. Graph model: Person = node. If A knows B, draw a directed edge from A to B. Celebrity is a node with out-degree = 0; in-deg = n-1 (sink). A graph can have at most 1 sink/celebrity. The problem is to identify the celebrity, if one exists, by asking only questions of the following form: "Excuse me, do you know person x?" There are n(n-1)/2 pairs of persons, so potentially we need to ask n(n-1) questions. Turns out one can solve the problem without even looking at all the input! 5. Induction: First attempt. With 2 people, the game is simple. Two questions. For the general case, take out person n, and inductively solve the problem for the remaining (n-1). 3 possible outcomes: the celebrity is among first n-1; the celebrity is the nth guy; no celebrity at all. Case 1 is easiest---just need to check that the nth person knows the celebrity, and the celebrity does not know him. The other two cases are tricky. To determine if nth person is celebrity, we need to ask 2(n-1) questions. But if we ask so many questions in the nth step, the total complexity of the algorithm will be n(n-1) questions. 5. The correct solution is based on the following observation: For any two players A and B, if A knows B, then A cannot be a celebrity; if A does not know B, then B cannot be a celebrity. Thus, with 1 question, we can eliminate one of two candidates. Algorithm: Pick any two players, A and B. Based on the outcome of the question, eliminate the non-celeb, and inductively solve the remaining n-1 people. If the remainder has no celebrity, then done. Otherwise, ask 2 questions to make sure that the celebrity is known to the person eliminated in first round; and that celebrity does not know that person. Thus, the total number of questions is at most 3(n-1). Stack Implementation: Take top two people; with one question, eliminate one of them; repeat, until one person remains. Now go back, and ask 2(n-1) questions, comparing this celeb with (n-1) losers. ------------------------------------------------------------------ 6. Missing Numbers You are processing a stream of n numbers (presented in arbitrary order); each number is presented exactly once. You have limited constant amount of memory---say, a few words. Suppose you are told that EXACTLY one of the numbers from the set {1, 2, ..., n} is missing. Can you identify it? Solution: Keep track of the running sum. Subtract is from n(n+1)/2. How about if 2 numbers are missing? You could try storing s = n(n+1)/2 - \sum xi, and p = n! - \pi xi, giving two equations in two unknowns. But one can solve the same problem with far fewer bits: s = n(n+1)/2 - \sum xi and ss = n(n+1)(2n+1)/6 - \sum xi^2 In general, we could store kth power sums.

Read full article from


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts