Scoring Variables
Lucene’s default scoring system works very well for most cases. It uses seven different variables to determine the final ranking of each document. They are: (from lucenetutorial.com)
- tf = term frequency in document = measure of how often a term appears in the document
- idf = inverse document frequency = measure of how often the term appears across the index
- coord = number of terms in the query that were found in the document
- lengthNorm = measure of the importance of a term according to the total number of terms in the field
- queryNorm = normalization factor so that queries can be compared
- boost (index) = boost of the field at index-time
- boost (query) = boost of the field at query-time
These factors are fed into the Similarity algorithm, details of which can be found in Lucene’s java-doc and tutorial pages. For the moment I will focus on the simplest method for adjusting scoring: “Boost”.
Read full article from Custom Lucene Scoring | Architexa – Working with Large Codebases » Blog Archive
No comments:
Post a Comment