All About Programming: Custom Lucene Scoring | Architexa - Working with Large Codebases >> Blog Archive

Scoring Variables

Lucene’s default scoring system works very well for most cases. It uses seven different variables to determine the final ranking of each document. They are: (from lucenetutorial.com)

tf = term frequency in document = measure of how often a term appears in the document

idf = inverse document frequency = measure of how often the term appears across the index

coord = number of terms in the query that were found in the document

lengthNorm = measure of the importance of a term according to the total number of terms in the field

queryNorm = normalization factor so that queries can be compared

boost (index) = boost of the field at index-time

boost (query) = boost of the field at query-time

These factors are fed into the Similarity algorithm, details of which can be found in Lucene’s java-doc and tutorial pages. For the moment I will focus on the simplest method for adjusting scoring: “Boost”.

Read full article from Custom Lucene Scoring | Architexa – Working with Large Codebases » Blog Archive

Custom Lucene Scoring | Architexa - Working with Large Codebases >> Blog Archive

No comments:

Post a Comment

Labels

Popular Posts