Find the kth most frequent element in a data stream | Hello World



Find the kth most frequent element in a data stream | Hello World

In the word of internet, where everything is in a very large scale. For example you want to find out that among all the ads, which ones are clicked most frequently in a day?

This problem comes down to find the kth most frequent element in a data stream. (or find the element whose frequency is greater than a user defined value).

In order to find the kth most frequent element in a data stream, we need a space of the length of the data stream N. which is not desirable. So we have to use approximation

Two kind of method are proposed to solve the problem
1. counter-based: only monitor a subset of the incoming stream, each element being monitored was assigned a counter, if the incoming element is monitored, the counter is updated, otherwise, do something depending on algorithm.
2. Estimate frequency for all elements using bit-maps of counters
Each element is hashed into the counters' space using a family of hash functions. Hashed-to counters are queried for the frequencies


Read full article from Find the kth most frequent element in a data stream | Hello World


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts