The Rabin-Karp algorithm uses a technique called fingerprinting.
1. Given the pattern of length n , hash it. 2. Now hash the first n characters of the text string. 3. Compare the hash values. Are they the same? If not, then it is impossible for the two strings to be the same. If they are, then we need to do a normal string comparison to check if they are actually the same string or if they just hashed to the same value (remember that two different strings can hash to the same value). If they match, we're done. If not, we continue. 4. Now shift over a character in the text string. Get the hash value. Continue as above until the string is either found or we reach the end of the text string.
Now you may be wondering to yourself, "I don't get it. How can this be anything less than O(MN) as to create the hash for each place in the text string, don't we have to look at every character in it?" The answer is no, and this is the trick that Rabin and Karp discovered.
The initial hashes are called fingerprints. Rabin and Karp discovered a way to update these fingerprints in constant time. In other words, to go from the hash of a substring in the text string to the next hash value only requires constant time. Let's take a simple hash function and look at an example to see why and how this works.
Read full article from SparkNotes: Hash Tables: Another use of hashing: Rabin-Karp string searching
No comments:
Post a Comment