All About Programming: Salmon Run: Three Autocomplete implementations compared

Salmon Run: Three Autocomplete implementations compared

Many web sites are now offering forms which suggest completions as you type. For example, in a form to hold the name of a US state, typing in "C" will pop up a list box containing ["California", "Colorado"]. Subsequently typing in an "a" will decrease the options in the list box to only ["California"]. Hitting the ENTER key will populate the field with "California". One of the most popular (although probably not the first) implementations is Google Suggest.

This feature is certainly very helpful from the user's perspective, since it saves keystrokes and enables him to get his job done faster. A side effect is that the list of completions aids in the process of discovery. For the user, it could mean that he gets to pages which he would not have looked at otherwise. For the site owner, it means that the site is more "sticky", thus translating into more page views and advertising dollars for sites that depend on advertising.

I have been curious about how auto-complete works, although the curiosity did not translate into actual code till recently. Obviously, AJAX is part of the equation, since each keystroke event in the form needs to be captured and sent back to the server and the possible completions returned and displayed in the scope of a single request. I was more interested, however, in how the server-side component can be built to efficiently return the results it needed to.

Over the last week, I came up with three possible implementations to do auto-completions on the file names in my ~/tmp directory. There are about 280 files in there, so this is nothing compared to what production quality auto-completion components will have to serve on real websites, but it could be a starting point for better ideas. I enumerate them here, with code, and some relative performance numbers.

In-Memory Trie

Tries are specialized data structures where a word can be stored as a sequence of characters. Reading the word involves traversing down the branch of the tree. At each node, the possible completions of the partial word can be found by traversing down all possible paths to the leaf level. It seemed ideal for modeling auto-completions, which is why I chose it. A Trie is modelled as a collection of TrieNode objects. A TrieNode is basically the current character and a Map of completions. Here is the code:

Read full article from Salmon Run: Three Autocomplete implementations compared

Salmon Run: Three Autocomplete implementations compared

No comments:

Post a Comment

Labels

Popular Posts