All About Programming: Apache Lucene FlexibleScoring with IndexDocValues « Trifork Blog / Trifork: Enterprise Java, Open Source, software solutions

With Lucene 4.0 this is history! David Nemeskey and Robert Muir added an extensible API as well as index based statistics like Sum of Total Term Frequency or Sum of Document Frequency per Field to provide multiple scoring models. Lucene 4.0 comes with:

TF/IDF Vector-Space Model
Divergence from Randomness
Language Models
Information Based Models
Okapi BM25

Lucene's central scoring class Similarity has been extended to return dedicated Scorers likeExactDocScorer and SloppyDocScorer to calculate the actual score. This refactoring basically moved the actual score calculation out of the QueryScorer into a Similarity to allow implementing alternative scoring within a single method. Lucene 4.0 also comes with a new SimilarityProvider which lets you define a Similarity per field. Each field could use a slightly different similarity or incorporate additional scoring factors like IndexDocValues.

Read full article from Apache Lucene FlexibleScoring with IndexDocValues « Trifork Blog / Trifork: Enterprise Java, Open Source, software solutions

Apache Lucene FlexibleScoring with IndexDocValues « Trifork Blog / Trifork: Enterprise Java, Open Source, software solutions

No comments:

Post a Comment

Labels

Popular Posts