SparkNotes: Hash Tables: Another use of hashing: Rabin-Karp string searching



The Rabin-Karp algorithm uses a technique called fingerprinting.

1. Given the pattern of length n , hash it. 2. Now hash the first n characters of the text string. 3. Compare the hash values. Are they the same? If not, then it is impossible for the two strings to be the same. If they are, then we need to do a normal string comparison to check if they are actually the same string or if they just hashed to the same value (remember that two different strings can hash to the same value). If they match, we're done. If not, we continue. 4. Now shift over a character in the text string. Get the hash value. Continue as above until the string is either found or we reach the end of the text string.

Now you may be wondering to yourself, "I don't get it. How can this be anything less than O(MN) as to create the hash for each place in the text string, don't we have to look at every character in it?" The answer is no, and this is the trick that Rabin and Karp discovered.

The initial hashes are called fingerprints. Rabin and Karp discovered a way to update these fingerprints in constant time. In other words, to go from the hash of a substring in the text string to the next hash value only requires constant time. Let's take a simple hash function and look at an example to see why and how this works.


Read full article from SparkNotes: Hash Tables: Another use of hashing: Rabin-Karp string searching


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts