algorithms - Word Frequency with Ordering in O(n) Complexity - Computer Science Stack Exchange



algorithms - Word Frequency with Ordering in O(n) Complexity - Computer Science Stack Exchange

  1. Read the text and insert all the word encountered into a trie, maintaining in each node a count, how often the word represented by this node has occured. Additionally keep track of the highest word count. -- O(n)
  2. Initialize an array of size maxWordCount. Entry type are lists of strings. -- O(n), since the count can't be higher.
  3. Traverse the trie and for each node add the corresponding string to the array entry indicated by the count. -- O(n), since the total length of strings is bounded by n.
  4. Traverse the array in descending order and output the desired number of strings. -- O(n), since that is a bound on both the size of and the amount of data in the array.

You can probably replace the trie by other data structures in the first phase.


Read full article from algorithms - Word Frequency with Ordering in O(n) Complexity - Computer Science Stack Exchange


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts