Why Not AND, OR, And NOT?



Why Not AND, OR, And NOT?
BooleanQuery: Great Class, Bad Name
The BooleanQuery class is probably one of the most misleading class names in the entire Lucene code base because it doesn’t model simple boolean logic query operations at all. The basic function of a BooleanQuery is:
  1. A BooleanQuery consists of one or more BooleanClauses, each of which contains two pieces of information:
    • A nested Query
    • An Occur flag, which has one of three values
      • MUST – indicating that documents must match this nested Query in order for the document to match the BooleanQuery, and the score from this subquery should contribute to the score for the BooleanQuery
      • MUST_NOT – indicating that documents which match this nested Query are prohibited from matching the BooleanQuery
      • SHOULD – indicating that documents which match this nested Query should have their score from the nested query contribute to the score from the BooleanQuery, but documents can be a match for the BooleanQuery even if they do not match the nested query
  2. If a BooleanQuery contains no MUST BooleanClauses, then a document is only considered a match against the BooleanQuery if one or more of the SHOULD BooleanClauses is a match.
  3. The final score of a document which matches a BooleanQuery is based on the sum of the scores from all the matching MUST and SHOULD BooleanClauses, multiplied by a “coord factor” based on the ratio of the number of matching BooleanClauses to the total number of BooleanClauses in the BooleanQuery.

Query Parser: Prefix Operators

In the Lucene QueryParser (and all of the other parsers that are based on it, like DisMax and EDisMax) the “prefix” operators “+” and “-” map directly to the Occur.MUST and Occur.MUST_NOT flags, while theabsence of a prefix maps to the Occur.SHOULD flag by default. 
Please note how import it is to use parentheses to combine multiple operators in order in order to generate queries that correctly model boolean logic.

Please read full article from Why Not AND, OR, And NOT?

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts