Serving autocomplete suggestions fast!



Serving autocomplete suggestions fast!

Autocomplete (also known as live suggestions or search suggestions) is very popular with Search applications. It is generally used to return either query suggestion (à la Google Autocomplete) or to propose existing search results (à la Facebook).

Open source search platforms like Solr and Elasticsearch support this feature. Both allow for implementing autocomplete using edge n-grams. N-grams are subsets of words. Edge n-grams are subsets from one edge of a word (generally the beginning). For example, the edge n-grams of the word search are s, se, sea, sear, searc and search. The general principle when using them to support autocomplete is to index all those n-grams in the search index with the original word stored as is. When the user starts typing a word, we send what is typed so far as a query to the index containing the n-grams. For example, if we send the query sea, the index should return the word search as sea is an n-gram of this stored word.

This technique is very flexible as it allows for easy implementation of interesting functionalities, such as fuzzy search and infix matching. You can already find detailed instructions on how to implement the edge n-grams technique around the Web. For example, you can read this popular tutorial by Jay Hill to implement it in Solr, and this one by Jon Tai for Elasticsearch.


Read full article from Serving autocomplete suggestions fast!


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts