Solr Luence 排序



Solr Luence 排序
uence 和solr排序都有排序功能,solr的排序就是基于luence的排序来实现的。solr通过url里加solr=true来排序,把后面带的参数封装成SortField,然后根据luence的底层来排序。下面开始讲luence排序的实现。

luence排序是基于luence有一个最小堆PriorityQueue,PriorityQueue最小堆的比较规则,由子类实现,即lessThan方法。
Luence的FieldValueHitQueue继承了PriorityQueue,我们排序的时候,有可能是根据一个field或者是多个field来排序。
那luence对应的FieldValueHitQueue有两个子类,分别是OneComparatorFieldValueHitQueue和MultiComparatorsFieldValueHitQueue,就是如果按一个Field排序和多个Field的比较方法不一样。分别如下:
 //当个field排序的比较方法。
@Override
protected boolean lessThan(final Entry hitA, final Entry hitB) {
      assert hitA != hitB;
      assert hitA.slot != hitB.slot;
      final int c = oneReverseMul * comparator.compare(hitA.slot, hitB.slot);
      if (c != 0) {
           return c > 0;
      }
       // avoid random sort order that could lead to duplicates (bug #31241):
      return hitA.doc > hitB.doc;
 }

  //多个field排序的比较方法。
@Override
 protected boolean lessThan(final Entry hitA, final Entry hitB) {
      assert hitA != hitB;
      assert hitA.slot != hitB.slot;
      int numComparators = comparators.length;
      for (int i = 0; i < numComparators; ++i) {
        final int c = reverseMul[i] * comparators[i].compare(hitA.slot, hitB.slot);
        if (c != 0) {
          // Short circuit
          return c > 0;
        }
      }
      // avoid random sort order that could lead to duplicates (bug #31241):
      return hitA.doc > hitB.doc;
}
先讲下comparators和reverseMul:
comparators:  是一个数组,如果是当个Field,就是给comparators[0] = field.getComparator(size, 0);即根据不同Field不同的数据类型创建不同的比较器。如果是多个Field,则为每个Field创建一个比较器。
reverseMul:决定按升序还是降序。
从上面两个方法可以看出,如果是多个Field排序,如果第一个Field比较的结果不相等,则按第一Field决定,不会再比较后面的Field,如果第一个Field的值相等,则按后面的Field比较。如果都相等,则按docID的大小来比较。

比较器比较的肯定是要排序Field的值,那Field的值是在什么时候取到的呢,这就是比较器FieldComparator有一个setNextReader的方法。这个方法在Iuence的IndexSearch的search方法里回调用。代码如下:

//这里说明下,collector,如果有排序,collector为TopFieldCollector。
for (int i = 0; i < subReaders.length; i++) { // search each subreader
        collector.setNextReader(subReaders[i], docStarts[i]);
        Scorer scorer = weight.scorer(subReaders[i], !collector.acceptsDocsOutOfOrder(), true);
        if (scorer != null) {
          scorer.score(collector);
        }
 }
而TopFieldCollector的内部子类多个Field的排序的代码为:
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
      this.docBase = docBase;
      for (int i = 0; i < comparators.length; i++) {
        comparators[i].setNextReader(reader, docBase);
      }
 }
从上面可以看出,其实是比较器FieldComparator的setNextReader方法。FieldComparator的方法就是通过FieldCache的实现类FieldCacheImpl去取对应Field的值,如果没有,则通过Reader去索引库取,然后放到FieldCache缓存。
Please read full article from Solr Luence 排序

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts