Our Solution to Solr Multiterm Synonyms: The Match Query Parser



Our Solution to Solr Multiterm Synonyms: The Match Query Parser

You have probably heard us talk about Solr multiterm synonyms a lot! It's a big problem that prevents a lot of organizations from getting reasonable search relevance out of Solr. The problem has been described as the "sea biscuit" problem. Because, if you have a synonyms.txt file like:

sea biscuit => seabiscuit  

… you unfortunately won't get what you expect at query time. This is because most Solr query parsers break up query strings on spaces before running query-time analysis. If you search for "sea biscuit" Solr sees this first as [sea] OR [biscuit]. The required analysis step then happens on each individual clause – first on just "sea" then on just "biscuit." Without analysis seeing a "sea" right before a "biscuit", query time analysis doesn't recognize the synonym listed above. Bummer.


Read full article from Our Solution to Solr Multiterm Synonyms: The Match Query Parser


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts