1. Given a multi-set of numbers (integers), determine if there is a majority element. E.g. S = {4, 1, 3, 4, 4, 3, 3, 4, 4, 1, 4} has 4 in majority S = {4, 4, 4, 3, 1, 1, 1, 3, 2, 2, 2} has no majority In a sequence x1, x2, ..., xn, define the multiplicity of x to be its frequency. A number is in majority if its multiplicity is greater than n/2. (For example, imagine each xi represents a vote; ints are ids of candidates.) One simple solution is to sort; all equal number are consecutive, so can count. Takes O(n log n) time. A more complicated solution uses median---if there is a majority, it must be the median. It can be found in O(n) time, but complex. A much better, simpler linear time solution exists. Observation: Pick any two elements xi and xj. If xi \neq xj, then eliminate BOTH xi and xj from the list. Then, the majority in the original list is still a majority in the reduced list. ALGORITHM: Use two variables C (candidate) and M (multiplicity). Scan elements in their given order. When considering xi, inductively, C is the only candidate for majority among {x1, x2, ..., x_i-1} and M is its count excluding the times C was eliminated. When xi is considered, if C is zero (before comparison), put xi = C and M = 1. else if xi = C, increment M; if xi \neq C, decrement M; At the end, C is the only possible candidate for majority. We make a second pass to "confirm" that. O(n) time. 2. Extension to K elements, each appearing at least n/(k+1) times. Maintain k candidates and counters. When considering xi, if xi equals one of the cands, increment it; if xi different and some counter available, set xi as that cand, and M = 1; if xi different from all, and all cands full; decrement them all. 3. A hot application problem in Data Stream context---heavy hitters in routers, top k popular items, etc... One important attribute of this scheme is that it uses O(1) memory---independent of the stream size. ------------------------------------------------------------------ 4. Celebrity Problem. Among n people, a celebrity is defined as someone who is known to everyone, but who knows no one. Graph model: Person = node. If A knows B, draw a directed edge from A to B. Celebrity is a node with out-degree = 0; in-deg = n-1 (sink). A graph can have at most 1 sink/celebrity. The problem is to identify the celebrity, if one exists, by asking only questions of the following form: "Excuse me, do you know person x?" There are n(n-1)/2 pairs of persons, so potentially we need to ask n(n-1) questions. Turns out one can solve the problem without even looking at all the input! 5. Induction: First attempt. With 2 people, the game is simple. Two questions. For the general case, take out person n, and inductively solve the problem for the remaining (n-1). 3 possible outcomes: the celebrity is among first n-1; the celebrity is the nth guy; no celebrity at all. Case 1 is easiest---just need to check that the nth person knows the celebrity, and the celebrity does not know him. The other two cases are tricky. To determine if nth person is celebrity, we need to ask 2(n-1) questions. But if we ask so many questions in the nth step, the total complexity of the algorithm will be n(n-1) questions. 5. The correct solution is based on the following observation: For any two players A and B, if A knows B, then A cannot be a celebrity; if A does not know B, then B cannot be a celebrity. Thus, with 1 question, we can eliminate one of two candidates. Algorithm: Pick any two players, A and B. Based on the outcome of the question, eliminate the non-celeb, and inductively solve the remaining n-1 people. If the remainder has no celebrity, then done. Otherwise, ask 2 questions to make sure that the celebrity is known to the person eliminated in first round; and that celebrity does not know that person. Thus, the total number of questions is at most 3(n-1). Stack Implementation: Take top two people; with one question, eliminate one of them; repeat, until one person remains. Now go back, and ask 2(n-1) questions, comparing this celeb with (n-1) losers. ------------------------------------------------------------------ 6. Missing Numbers You are processing a stream of n numbers (presented in arbitrary order); each number is presented exactly once. You have limited constant amount of memory---say, a few words. Suppose you are told that EXACTLY one of the numbers from the set {1, 2, ..., n} is missing. Can you identify it? Solution: Keep track of the running sum. Subtract is from n(n+1)/2. How about if 2 numbers are missing? You could try storing s = n(n+1)/2 - \sum xi, and p = n! - \pi xi, giving two equations in two unknowns. But one can solve the same problem with far fewer bits: s = n(n+1)/2 - \sum xi and ss = n(n+1)(2n+1)/6 - \sum xi^2 In general, we could store kth power sums.
Subscribe to:
Post Comments (Atom)
Labels
Algorithm
(219)
Lucene
(130)
LeetCode
(97)
Database
(36)
Data Structure
(33)
text mining
(28)
Solr
(27)
java
(27)
Mathematical Algorithm
(26)
Difficult Algorithm
(25)
Logic Thinking
(23)
Puzzles
(23)
Bit Algorithms
(22)
Math
(21)
List
(20)
Dynamic Programming
(19)
Linux
(19)
Tree
(18)
Machine Learning
(15)
EPI
(11)
Queue
(11)
Smart Algorithm
(11)
Operating System
(9)
Java Basic
(8)
Recursive Algorithm
(8)
Stack
(8)
Eclipse
(7)
Scala
(7)
Tika
(7)
J2EE
(6)
Monitoring
(6)
Trie
(6)
Concurrency
(5)
Geometry Algorithm
(5)
Greedy Algorithm
(5)
Mahout
(5)
MySQL
(5)
xpost
(5)
C
(4)
Interview
(4)
Vi
(4)
regular expression
(4)
to-do
(4)
C++
(3)
Chrome
(3)
Divide and Conquer
(3)
Graph Algorithm
(3)
Permutation
(3)
Powershell
(3)
Random
(3)
Segment Tree
(3)
UIMA
(3)
Union-Find
(3)
Video
(3)
Virtualization
(3)
Windows
(3)
XML
(3)
Advanced Data Structure
(2)
Android
(2)
Bash
(2)
Classic Algorithm
(2)
Debugging
(2)
Design Pattern
(2)
Google
(2)
Hadoop
(2)
Java Collections
(2)
Markov Chains
(2)
Probabilities
(2)
Shell
(2)
Site
(2)
Web Development
(2)
Workplace
(2)
angularjs
(2)
.Net
(1)
Amazon Interview
(1)
Android Studio
(1)
Array
(1)
Boilerpipe
(1)
Book Notes
(1)
ChromeOS
(1)
Chromebook
(1)
Codility
(1)
Desgin
(1)
Design
(1)
Divide and Conqure
(1)
GAE
(1)
Google Interview
(1)
Great Stuff
(1)
Hash
(1)
High Tech Companies
(1)
Improving
(1)
LifeTips
(1)
Maven
(1)
Network
(1)
Performance
(1)
Programming
(1)
Resources
(1)
Sampling
(1)
Sed
(1)
Smart Thinking
(1)
Sort
(1)
Spark
(1)
Stanford NLP
(1)
System Design
(1)
Trove
(1)
VIP
(1)
tools
(1)
Popular Posts
-
Amazon OA2 Work Simulation | Keep Walking 目前两大原则: requirement排在第一, deadline第二 有manager出现的选项无脑选manager,manager就是一个组的代言人+保护伞,大哥自己人。 注意事项:...
-
I think the Lenovo IdeaPad S300/S400/S400u/S405 User Guide V2.0 (pdf) has what you want on page 12 in Chapter 2. Learning the basics where...
-
Given an array of non-negative integers, you are initially positioned at the first index of the array. Each element in the array represents...
-
【新提醒】找人内推以前先看这个帖子!内推人建议你如何写简历!【一亩三分地论坛内推版】 - Powered by Discuz! 1. 简历要靠谱。 的确很多人的简历typo一堆,或者言辞过于浮夸,或者跟职位完全不match,或者你18个月后才能上班但是你非要现在申请。 2....
-
【新提醒】我与Google的虐恋timeline //这是一篇流水账【一亩三分地论坛找工求职版】 - Powered by Discuz! 楼主15年12月毕业,抱着不知道从哪里来的谜一般的自信,对 找工 作毫不着急。结果大家也知道的,去年年底到现在的找工形势都不太好。 毕业后...
-
最神秘的大数据公司Palantir(一):教父级的创始人 - 数据冰山 - 知乎专栏 一、紧密围绕在以Peter Thiel主席为核心的各项事业周围 Palantir的创始人为Peter Thiel, Alex Karp, Joe Lonsdale, Stephen Cohen和...
-
【新提醒】对candidate的code interview的一点建议|一亩三分地求职版 - 1. 拒绝沉默,积极沟通 面试的老中烙印一半一半吧,必须承认烙印真的非常能说,不管他们思路是否受阻,都可以滔滔不绝的继续说下去,而且也会在必要的时候积极寻求hint,确认directi...
-
JacksonFeatureBiDirReferences - FasterXML Wiki Annotations are: @JsonManagedReference is the "forward" part of reference: one tha...
-
Minimum Spanning Tree (MST) problem: Given connected graph G with positive edge weights, find a min weight set of edges that connects all of...
-
[Java] Generate hasXXX() methods for fields inside a oneof · Issue #2984 · protocolbuffers/protobuf · GitHub Currently in Java, one must com...
No comments:
Post a Comment