In my view, the Knuth-Morris-Pratt string searching algorithm is one of those "magic" algorithms that do something wonderful but are difficult to explain or understand in detail without a significant time investment. The pedagogical question that I would like to address is not just: "how does it work?", or "why does it work?", or "does it work?". More importantly, it is: "how did they come up with it?". Of course, I wasn't there, and I am no historian, so let me instead address the somewhat related question: "how do I reconstruct it?".
Of course one can prove that the algorithm works; this can be done with pen-and-paper or with a machine-checked proof. This is very useful, but does not answer my question.
I know of several ways of explaining and reconstructing the algorithm. In one of them, one views the pattern as a non-deterministic finite-state automaton; by eliminating the є-transitions, one obtains a deterministic automaton, whose transitions form the "failure function". In the second approach, which is the one I would like to describe here, one begins with a naïve implementation of string searching, and by a series of program transformations, one derives an efficient algorithm, namely the Knuth-Morris-Pratt algorithm.
I found this idea in Ager, Danvy, and Rohde's paper, where it is somewhat buried in Appendix A. I rephrase it here in a slightly different way.
Read full article from Gagallium : Reconstructing the Knuth-Morris-Pratt algorithm
No comments:
Post a Comment