1000x Faster Spelling Correction algorithm | FAROO Blog
2. Peter Norvig
Generate all possible terms with an edit distance <=2 (deletes + transposes + replaces + inserts) from the query term and search them in the dictionary.
For a word of length n, an alphabet size a, an edit distance d=1, there will be n deletions, n-1 transpositions, a*n alterations, and a*(n+1) insertions, for a total of 2n+2an+a-1 terms at search time.
Source: Peter Norvig: How to Write a Spelling Corrector.
This is much better than the naive approach, but still expensive at search time (114,324 terms for n=9, a=36, d=2) and language dependent (because the alphabet is used to generate the terms, which is different in many languages and huge in Chinese: a=70,000 Unicode Han characters)
3. Symmetric Delete Spelling Correction (FAROO)
Generate terms with an edit distance <=2 (deletes only) from each dictionary term and add them together with the original term to the dictionary. This has to be done only once during a pre-calculation step.
Generate terms with an edit distance <=2 (deletes only) from the input term and search them in the dictionary.
For a word of length n, an alphabet size of a, an edit distance of 1, there will be just n deletions, for a total of n terms at search time.
Read full article from 1000x Faster Spelling Correction algorithm | FAROO Blog
No comments:
Post a Comment