Hacking Lucene for Custom Search Results



Hacking Lucene for Custom Search Results
Query responsible for specifying search behavior
• Both:
Matching – what documents to include in the results
Scoring – how relevant is a result to the query by assigning a score

How to use it
Use a normal Lucene query for matching
Term t = new Term("tag", "star-trek");
TermQuery tq = new TermQuery(t);

Create & Use a CustomQueryScorer for scoring that
wraps the Lucene query
CountingQuery ct = new CountingQuery(tq);
Extend CustomScoreQuery, provide a CustomScoreProvider
OpenSource Connections
protected CustomScoreProvider getCustomScoreProvider(
AtomicReaderContext context) throws IOException {
return new CountingQueryScoreProvider("tag", context);
}
CustomScoreProvider rescores each doc with IndexReader & docId
OpenSource Connections
// Give all docs a score of 1.0
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
return (float)(1.0f); // New Score
}

Example: Sort by number of terms in a field
OpenSource Connections
// Rescores by counting the number of terms in the field
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
IndexReader r = context.reader();
Terms tv = r.getTermVector(doc, _field);
TermsEnum termsEnum = null;
termsEnum = tv.iterator(termsEnum);
int numTerms = 0;
while((termsEnum.next()) != null) {
 numTerms++;
}
return (float)(numTerms); // New Score
}

CustomScoreQuery, Takeaway
SIMPLE!
Relatively few gotchas or bells & whistles (we will see lots of gotchas)
• Limited
No tight control on what matches
• If this satisfies your requirements: You should get off the train here
I care about overriding scoring
CustomScoreQuery
• I need to control custom scoring and matching
Custom Lucene Queries!

Please read full article from Hacking Lucene for Custom Search Results

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts