The "naive" approach is easy to understand and implement but it can be too slow in some cases. If the length of the text is n and the length of the pattern m, in the worst case it may take as much as (n * m) iterations to complete the task.
It should be noted though, that for most practical purposes, which deal with texts based on human languages, this approach is much faster since the inner loop usually quickly finds a mismatch and breaks. A problem arises when we are faced with different kinds of "texts," such as the genetic code.
Rabin-Karp Algorithm (RK)
This is actually the "naive" approach augmented with a powerful programming technique - the hash function.
Every string s[] of length m can be seen as a number H written in a positional numeral system in base B (B >= size of the alphabet used in the string):
H = s[0] * B(m - 1) + s[1] * B(m - 2) + … + s[m - 2] * B1 + s[m - 1] * B0
If we calculate the number H (the hash value) for the pattern and the same number for every substring of length m of the text than the inner loop of the "naive" method will disappear - instead of comparing two strings character by character we will have just to compare two integers.
A problem arises when m and B are big enough and the number H becomes too large to fit into the standard integer types. To overcome this, instead of the number H itself we use its remainder when divided by some other number M. To get the remainder we do not have to calculate H. Applying the basic rules of modular arithmetic to the above expression:
Read full article from Algorithm Tutorials
No comments:
Post a Comment