Exploring Lucene and Solr's TrieRange Capabilities



Exploring Lucene and Solr's TrieRange Capabilities
Since Lucene treats most everything as Strings, encoding  numbers and dates and then utilizing them in ranges has always required a little extra work to make it perform well.  Previously, one would have to have either use less precision or slower running queries in order to work with ranges that had a lot of distinct values.  This is due to the need for Lucene to enumerate through a large number of terms.
Of course, this is only scratching the surface.  The take away, though, is the new Trie stuff in L/S holds a lot of promise for speeding up range based numeric queries and further blurs the line between search engines and databases (I’d argue it makes search all that more compelling, but…)  More importantly, it is not dependent on the index size, but instead the precision chosen.  Essentially, it formalizes what many people have done in practice through the years with various field values.

  1. http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//contrib-queries/org/apache/lucene/search/trie/package-summary.html
  2. http://www.thetaphi.de/share/Schindler-TrieRange.ppt
Please read full article from Exploring Lucene and Solr's TrieRange Capabilities

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts