1. Given a multi-set of numbers (integers), determine if there is a majority element. E.g. S = {4, 1, 3, 4, 4, 3, 3, 4, 4, 1, 4} has 4 in majority S = {4, 4, 4, 3, 1, 1, 1, 3, 2, 2, 2} has no majority In a sequence x1, x2, ..., xn, define the multiplicity of x to be its frequency. A number is in majority if its multiplicity is greater than n/2. (For example, imagine each xi represents a vote; ints are ids of candidates.) One simple solution is to sort; all equal number are consecutive, so can count. Takes O(n log n) time. A more complicated solution uses median---if there is a majority, it must be the median. It can be found in O(n) time, but complex. A much better, simpler linear time solution exists. Observation: Pick any two elements xi and xj. If xi \neq xj, then eliminate BOTH xi and xj from the list. Then, the majority in the original list is still a majority in the reduced list. ALGORITHM: Use two variables C (candidate) and M (multiplicity). Scan elements in their given order. When considering xi, inductively, C is the only candidate for majority among {x1, x2, ..., x_i-1} and M is its count excluding the times C was eliminated. When xi is considered, if C is zero (before comparison), put xi = C and M = 1. else if xi = C, increment M; if xi \neq C, decrement M; At the end, C is the only possible candidate for majority. We make a second pass to "confirm" that. O(n) time. 2. Extension to K elements, each appearing at least n/(k+1) times. Maintain k candidates and counters. When considering xi, if xi equals one of the cands, increment it; if xi different and some counter available, set xi as that cand, and M = 1; if xi different from all, and all cands full; decrement them all. 3. A hot application problem in Data Stream context---heavy hitters in routers, top k popular items, etc... One important attribute of this scheme is that it uses O(1) memory---independent of the stream size. ------------------------------------------------------------------ 4. Celebrity Problem. Among n people, a celebrity is defined as someone who is known to everyone, but who knows no one. Graph model: Person = node. If A knows B, draw a directed edge from A to B. Celebrity is a node with out-degree = 0; in-deg = n-1 (sink). A graph can have at most 1 sink/celebrity. The problem is to identify the celebrity, if one exists, by asking only questions of the following form: "Excuse me, do you know person x?" There are n(n-1)/2 pairs of persons, so potentially we need to ask n(n-1) questions. Turns out one can solve the problem without even looking at all the input! 5. Induction: First attempt. With 2 people, the game is simple. Two questions. For the general case, take out person n, and inductively solve the problem for the remaining (n-1). 3 possible outcomes: the celebrity is among first n-1; the celebrity is the nth guy; no celebrity at all. Case 1 is easiest---just need to check that the nth person knows the celebrity, and the celebrity does not know him. The other two cases are tricky. To determine if nth person is celebrity, we need to ask 2(n-1) questions. But if we ask so many questions in the nth step, the total complexity of the algorithm will be n(n-1) questions. 5. The correct solution is based on the following observation: For any two players A and B, if A knows B, then A cannot be a celebrity; if A does not know B, then B cannot be a celebrity. Thus, with 1 question, we can eliminate one of two candidates. Algorithm: Pick any two players, A and B. Based on the outcome of the question, eliminate the non-celeb, and inductively solve the remaining n-1 people. If the remainder has no celebrity, then done. Otherwise, ask 2 questions to make sure that the celebrity is known to the person eliminated in first round; and that celebrity does not know that person. Thus, the total number of questions is at most 3(n-1). Stack Implementation: Take top two people; with one question, eliminate one of them; repeat, until one person remains. Now go back, and ask 2(n-1) questions, comparing this celeb with (n-1) losers. ------------------------------------------------------------------ 6. Missing Numbers You are processing a stream of n numbers (presented in arbitrary order); each number is presented exactly once. You have limited constant amount of memory---say, a few words. Suppose you are told that EXACTLY one of the numbers from the set {1, 2, ..., n} is missing. Can you identify it? Solution: Keep track of the running sum. Subtract is from n(n+1)/2. How about if 2 numbers are missing? You could try storing s = n(n+1)/2 - \sum xi, and p = n! - \pi xi, giving two equations in two unknowns. But one can solve the same problem with far fewer bits: s = n(n+1)/2 - \sum xi and ss = n(n+1)(2n+1)/6 - \sum xi^2 In general, we could store kth power sums.
Subscribe to:
Post Comments (Atom)
Labels
Algorithm
(219)
Lucene
(130)
LeetCode
(97)
Database
(36)
Data Structure
(33)
text mining
(28)
Solr
(27)
java
(27)
Mathematical Algorithm
(26)
Difficult Algorithm
(25)
Logic Thinking
(23)
Puzzles
(23)
Bit Algorithms
(22)
Math
(21)
List
(20)
Dynamic Programming
(19)
Linux
(19)
Tree
(18)
Machine Learning
(15)
EPI
(11)
Queue
(11)
Smart Algorithm
(11)
Operating System
(9)
Java Basic
(8)
Recursive Algorithm
(8)
Stack
(8)
Eclipse
(7)
Scala
(7)
Tika
(7)
J2EE
(6)
Monitoring
(6)
Trie
(6)
Concurrency
(5)
Geometry Algorithm
(5)
Greedy Algorithm
(5)
Mahout
(5)
MySQL
(5)
xpost
(5)
C
(4)
Interview
(4)
Vi
(4)
regular expression
(4)
to-do
(4)
C++
(3)
Chrome
(3)
Divide and Conquer
(3)
Graph Algorithm
(3)
Permutation
(3)
Powershell
(3)
Random
(3)
Segment Tree
(3)
UIMA
(3)
Union-Find
(3)
Video
(3)
Virtualization
(3)
Windows
(3)
XML
(3)
Advanced Data Structure
(2)
Android
(2)
Bash
(2)
Classic Algorithm
(2)
Debugging
(2)
Design Pattern
(2)
Google
(2)
Hadoop
(2)
Java Collections
(2)
Markov Chains
(2)
Probabilities
(2)
Shell
(2)
Site
(2)
Web Development
(2)
Workplace
(2)
angularjs
(2)
.Net
(1)
Amazon Interview
(1)
Android Studio
(1)
Array
(1)
Boilerpipe
(1)
Book Notes
(1)
ChromeOS
(1)
Chromebook
(1)
Codility
(1)
Desgin
(1)
Design
(1)
Divide and Conqure
(1)
GAE
(1)
Google Interview
(1)
Great Stuff
(1)
Hash
(1)
High Tech Companies
(1)
Improving
(1)
LifeTips
(1)
Maven
(1)
Network
(1)
Performance
(1)
Programming
(1)
Resources
(1)
Sampling
(1)
Sed
(1)
Smart Thinking
(1)
Sort
(1)
Spark
(1)
Stanford NLP
(1)
System Design
(1)
Trove
(1)
VIP
(1)
tools
(1)
Popular Posts
-
New Posts
-
【新提醒】sign on翻3倍,教你negotiate offer|一亩三分地求职版 - 下一张牌可以打offer牌,(如果有的话)。不是让你给数字!这张牌拆两次打,前期可以做些铺垫,自己和多家公司面试并且进展顺利,有update会告诉你之类的。现在可以告诉hr自己已经到了off...
-
【新提醒】我与Google的虐恋timeline //这是一篇流水账【一亩三分地论坛找工求职版】 - Powered by Discuz! 楼主15年12月毕业,抱着不知道从哪里来的谜一般的自信,对 找工 作毫不着急。结果大家也知道的,去年年底到现在的找工形势都不太好。 毕业后...
-
"1 in 10 cant even pass a simple test like ‘which design pattern is used in the streams library that makes BufferedFileReader intercha...
-
Amazon OA2 Work Simulation | Keep Walking 目前两大原则: requirement排在第一, deadline第二 有manager出现的选项无脑选manager,manager就是一个组的代言人+保护伞,大哥自己人。 注意事项:...
-
【新提醒】找人内推以前先看这个帖子!内推人建议你如何写简历!【一亩三分地论坛内推版】 - Powered by Discuz! 1. 简历要靠谱。 的确很多人的简历typo一堆,或者言辞过于浮夸,或者跟职位完全不match,或者你18个月后才能上班但是你非要现在申请。 2....
-
What are the best, most useful, must-have Google Chrome extensions? : AskReddit The great suspender. It will automatically freeze tabs you h...
-
12 Favorite Atom Tips and Shortcuts to Improve Your Workflow — SitePoint One annoyance I have is when I copy code from somewhere and the ind...
-
【新提醒】Data scientist @ Google,Apple【一亩三分地论坛面经版】 - Powered by Discuz! Apple则是同学内推。面了两轮电话面试,发现工作内容实在跟预想差太多(貌似就是query, generate summary statisti...
-
【新提醒】不换题的Pure Storage OA【一亩三分地论坛面经版】 - Powered by Discuz! 楼主第一次做OA...觉得值得提醒的点有: 0.开始之前要填写一些信息,比如你的recruiter name...注意查看邮箱的发件人。 1.前两道都是编程题,...
No comments:
Post a Comment