lucene4.7 之排序(四)



lucene4.7 之排序(四)
Sort里的属性SortField里的属性含义
Sort.INDEXORDERSortField.FIELD_DOC按照索引的顺序进行排序
Sort.RELEVANCESortField.FIELD_SCORE按照关联性评分进行排序
 =========SortField类============
//field是排序字段type是排序类型
public SortField(String field, Type type);
//field是排序字段type是排序类型reverse是指定升序还是降序
//reverse 为true是降序  false为升序
  public SortField(String field, Type type, boolean reverse)
 
  =========Sort类============
  public Sort();//Sort对象构造方法默认是按文档评分排序
  public Sort(SortField field);//排序的一个SortField
  public Sort(SortField... fields)//排序的多个SortField可以传入一个数组
  
  =========IndexSearche类r========
//query是查询的Query对象 filter是过滤  n返回的数量  sort是排序
search(Query query, Filter filter, int n, Sort sort) 
//doDocScores 为true情况下每个命中的结果下都会被评分
//doMaxScore  为true情况下对最大分值的搜索结果进行评分
search(Query query, Filter filter, int n, Sort sort, boolean doDocScores, boolean doMaxScore)
使用默认的关联性评分后,核心代码和运行效果图如下: 
?
1
2
   Sort sort=new Sort();//默认使用关联性评分
关于上图中乱码字符原因是因为默认排序情况下lucene是不会对搜索结果进行评分操作的,因为评分操作会降低性能,所以关于score的那一列返回的是NAN的字符串

Sort sort=new Sort(new SortField("date", Type.INT,true));//true为降序排列
TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort);

Sort sort=new Sort(new SortField("price", Type.DOUBLE,false));//false为降序排列
TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort);

// Sort sort=new Sort(new SortField("date", Type.INT, true),new SortField("ename", Type.STRING, false));
            //这两段代码效果一样
            Sort sort=new Sort(new SortField[]{new SortField("date", Type.INT, true),new SortField("ename", Type.STRING, false)});

             TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort);
6,带评分的排序,注意后面两个布尔类型的变量可以控制是否评分,特别是在没有要求需要打分时,建议别开启,大数量时对性能影响较大,检索“编程”得到的结果,默认按评分降序排序
Sort sort=Sort.RELEVANCE;
TopDocs topDocs=searcher.search(new TermQuery(new Term("bookname""编程")),null,100,sort,true,true);

7,注意几点 
(1)排序对一个文档里什么域都没存储,使用字符串排序会排在首位 
(2)排序对一个文档里什么域都没存储,使用数字类型排序会默认给其赋值为0进行排序 
(3)我们可以对数字类型的null值的文档进行代码控制,可以将其设置为最大,所以将会排在最后面,代码如下 

?
1
2
 SortField sortField = new SortField("value", SortField.Type.INT);
    sortField.setMissingValue(Integer.MAX_VALUE);
Please read full article from lucene4.7 之排序(四)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts