All About Programming: Rolling hash, Rabin Karp, palindromes, rsync and others

Rolling hash is a neat idea found in the Rabin-Karp string matching algorithm which is easy to grasp and useful in quite a few different contexts.

As before, the most interesting part of the article is the problem section. Discuss them in the comment section, but first let's go through a few applications.

The Rabin-Karp algorithm

Given a string P of length n and a string S of length m find out all the occurrences of P within S

The algorithm works by building a fingerprint for each substring of S of length n. It checks if any of them match the fingerprint of P. Implementing this directly leads to a O(n*m) solution which isn't faster than the naive matching algorithm (in fact it may be slower).

The insight is that you can efficiently compute the hash value of a substring using the hash value of previous substring if you use the right hash function. A polynomial with the string characters as coefficients works well. Let $H(c)=c[0]a^{n-1} + c[1]a^{n-2}+...+c[n-1]$ be the hash function for the string c.
All the operations are done modulo a prime number so that we don’t have to deal with large integers.

Read full article from Rolling hash, Rabin Karp, palindromes, rsync and others

Rolling hash, Rabin Karp, palindromes, rsync and others

The Rabin-Karp algorithm

No comments:

Post a Comment

Labels

Popular Posts