DocValues的理解应用_daysmileface_新浪博客



在lucene-4.0-alpha的Hightlights中有这么一段:

Added support for per-document values (DocValues). DocValues can be used for custom scoring factors (accessible via Similarity), for pre-sorted Sort values, and more.它的主要作用是custom scoring factors。


其实第一眼看到这HightLight,想到的是这对比原来的Document级别的Boost有哪些不同。Document级别的Boost虽然不是一个custom scoring factor, 但它也可以用来影响整体score(在根据score进行排序的情况下)。

 

Document boost的特点:

 

default value is 1.0


This value will be multiplied intothe score of all hits on this document.


Values are multiplied into the value of Fieldable#getBoost() of
each field in this document. Thus, this method in effect sets a default boost for the fields of this document, 也就是说this value会被传递影响到field boost,如果field boost是1.0的话,也就相当于给所有field 设置了一个和document一样的boost。


作用norm的一部分,这个boost没有单独的一块地方来store它。只能和其它norm共存一室。
基于这些点,这个boost所起的作用是被固定的。


相对于4.0中的DocValues。4.0中docvalues
应该说4.0中的docvalues是伴随着4.0中similarity做成一个component后而出现的。
在4.0之前整体体系结构上是使用 its hard-wired classic vector space scoring model。

在4.0中,我们可以通过similarity的封装,在计算score时,按自己的需要来决定docvalues中的值来影响socre。


@Override
public ExactSimScorer exactSimScorer(SimWeight stats, AtomicReaderContext context) throws IOException {
final ExactSimScorer sub = sim.exactSimScorer(stats, context);
final Source values = context.reader().docValues(boostField).getSource();
 
return new ExactSimScorer() {
@Override

public float score(int doc, int freq) {

return (float) values.getFloat(doc) * sub.score(doc, freq);

}
 

 @Override

public Explanation explain(int doc, Explanation freq) {
Explanation boostExplanation = new Explanation((float) values.getFloat(doc), "indexDocValue(" + boostField + ")"); 
Explanation simExplanation = sub.explain(doc, freq); 
Explanation expl = new Explanation(boostExplanation.getValue() * simExplanation.getValue(), "product of:");
expl.addDetail(boostExplanation);

expl.addDetail(simExplanation);

return expl; 

};

}


在这里是直接使用docvalues中的值乘以原先的score。当然我们也可以相加,or其它运算。


Read full article from DocValues的理解应用_daysmileface_新浪博客


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts