With Lucene 4.0 this is history! David Nemeskey and Robert Muir added an extensible API as well as index based statistics like Sum of Total Term Frequency or Sum of Document Frequency per Field to provide multiple scoring models. Lucene 4.0 comes with:
- TF/IDF Vector-Space Model
- Divergence from Randomness
- Language Models
- Information Based Models
- Okapi BM25
Lucene's central scoring class Similarity has been extended to return dedicated Scorers likeExactDocScorer and SloppyDocScorer to calculate the actual score. This refactoring basically moved the actual score calculation out of the QueryScorer into a Similarity to allow implementing alternative scoring within a single method. Lucene 4.0 also comes with a new SimilarityProvider which lets you define a Similarity per field. Each field could use a slightly different similarity or incorporate additional scoring factors like IndexDocValues.
Read full article from Apache Lucene FlexibleScoring with IndexDocValues « Trifork Blog / Trifork: Enterprise Java, Open Source, software solutions
No comments:
Post a Comment