Hacking Lucene for Custom Search Results
Query responsible for specifying search behavior
• Both:
Matching – what documents to include in the results
Scoring – how relevant is a result to the query by assigning a score
Extend CustomScoreQuery, provide a CustomScoreProvider
OpenSource Connections
protected CustomScoreProvider getCustomScoreProvider(
AtomicReaderContext context) throws IOException {
return new CountingQueryScoreProvider("tag", context);
}
CustomScoreProvider rescores each doc with IndexReader & docId
OpenSource Connections
// Give all docs a score of 1.0
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
return (float)(1.0f); // New Score
}
Example: Sort by number of terms in a field
OpenSource Connections
// Rescores by counting the number of terms in the field
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
IndexReader r = context.reader();
Terms tv = r.getTermVector(doc, _field);
TermsEnum termsEnum = null;
termsEnum = tv.iterator(termsEnum);
int numTerms = 0;
while((termsEnum.next()) != null) {
numTerms++;
}
return (float)(numTerms); // New Score
}
CustomScoreQuery, Takeaway
I care about overriding scoring
CustomScoreQuery
• I need to control custom scoring and matching
Custom Lucene Queries!
Please read full article from Hacking Lucene for Custom Search Results
Query responsible for specifying search behavior
• Both:
Matching – what documents to include in the results
Scoring – how relevant is a result to the query by assigning a score
How to use it
Use a normal Lucene query for matching
Term t = new Term("tag", "star-trek");
TermQuery tq = new TermQuery(t);
Create & Use a CustomQueryScorer for scoring that
wraps the Lucene query
CountingQuery ct = new CountingQuery(tq);
OpenSource Connections
protected CustomScoreProvider getCustomScoreProvider(
AtomicReaderContext context) throws IOException {
return new CountingQueryScoreProvider("tag", context);
}
CustomScoreProvider rescores each doc with IndexReader & docId
OpenSource Connections
// Give all docs a score of 1.0
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
return (float)(1.0f); // New Score
}
Example: Sort by number of terms in a field
OpenSource Connections
// Rescores by counting the number of terms in the field
public float customScore(int doc, float subQueryScore, float
valSrcScores[]) throws IOException {
IndexReader r = context.reader();
Terms tv = r.getTermVector(doc, _field);
TermsEnum termsEnum = null;
termsEnum = tv.iterator(termsEnum);
int numTerms = 0;
while((termsEnum.next()) != null) {
numTerms++;
}
return (float)(numTerms); // New Score
}
CustomScoreQuery, Takeaway
SIMPLE!
Relatively few gotchas or bells & whistles (we will see lots of gotchas)
• Limited
No tight control on what matches
• If this satisfies your requirements: You should get off the train here
CustomScoreQuery
• I need to control custom scoring and matching
Custom Lucene Queries!
No comments:
Post a Comment