lucene4.7 过滤Filter(六)



lucene4.7 过滤Filter(六)
关于过滤方面的知识,也就是Filter,如果了解Solr的朋友们,肯定都会知道Solr里面fq这个参数,这个参数的作用其实就是lucene里面的过滤,对一些q参数查询的结果集,做过滤或者限制返回一些我们需要的内容,可以理解成缩小搜索空间的一种策略。

//使用过滤器   最后一个为true时包含边界部分,为false时不包含边界部分
//倒数第二个为true时,包含查询边界,为false时不包含
TermRangeFilter filter=new TermRangeFilter("ename"new BytesRef("h"), new BytesRef("n"), truetrue);
TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);//默认无排序方式

 NumericRangeFilter<Double> filter=NumericRangeFilter.newDoubleRange("price", 10D, 40D, truefalse);
  TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);//默认无排序方式
 
//使用缓存过滤
Filter filter=FieldCacheRangeFilter.newDoubleRange("price", 20D, 50D, truetrue);
  TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);//默认无排序方式

// 缓存域过滤特定的类别
 Filter filter=new FieldCacheTermsFilter("type"new String[]{"技术","社会"});
 TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);//默认无排序方式

//使用QueryWrapperFilter类包装一个Query
 QueryWrapperFilter  filter=new QueryWrapperFilter(new TermQuery(new Term("type""技术")));
 TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);//默认无排序方式
最后我来看下,如何继承Filter基类,来定制我们自己的filter,自定义的Filter,虽然某些时候,功能很强大灵活,但是有几个缺点,我们的了解1,保证是内容不重复的字段,例如主键,如果重复,默认返回第一个作为结果集显示2,保证不能被分词的内容,如果是分词的字段,则可能会出现一些不正确的结果。 

public class MyCustomFilter extends Filter{
     
    public MyCustomFilter() {
        // TODO Auto-generated constructor stub
    }
     
    private String[] terms;//限制返回的数据字典
    public MyCustomFilter(String ...terms) {
        // TODO Auto-generated constructor stub
        this.terms=terms;
    }
    @Override
    public DocIdSet getDocIdSet(AtomicReaderContext arg0, Bits arg1)
            throws IOException {
        FixedBitSet bits=new FixedBitSet(arg0.reader().maxDoc())  ;//获取没有所有的docid包括未删除的
         int base=arg0.docBase;//段的相对基数,保证多个段时相对位置正确
         //int limit=base+arg0.reader().maxDoc();//计算最大限制值
        for(String s:terms){
              DocsEnum doc=arg0.reader().termDocsEnum(new Term("id", s));//必须是唯一的不重复
              //保证是单个不重复的term,如果重复的话,默认会取第一个作为返回结果集,分词后的term也不适用自定义term
              if(doc.nextDoc()!=-1){ 
                bits.set(doc.docID());//对付符合条件约束的docid循环添加到bits里面
                }
              }
        return bits;
    }
}
MyCustomFilter filter=new MyCustomFilter("3","5","2");//随意指定1之多个需要过滤的项
 TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),filter,10000);
Please read full article from lucene4.7 过滤Filter(六)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts