lucene4.x的分组实现



lucene4.x的分组实现
public static List<HashMap<String, String>> testGroup(String indexPath,String groupField,String sumField){
        List<HashMap<String, String>> map=new ArrayList<HashMap<String,String>>();
        Directory d1=null
         IndexReader read1=null;
        try{
         d1=FSDirectory.open(new File(indexPath));//磁盘索引
          read1=DirectoryReader.open(d1);//打开流
       IndexSearcher sear=new IndexSearcher(new MultiReader(read1));//MultiReader此类可以多份索引的读入
       //但是得保证各个索引的字段结构一致
        GroupingSearch  gSearch=new GroupingSearch(groupField);//分组查询按照place分组
        Query q=new WildcardQuery(new Term(groupField,"*"));//查询所有数据
          TopGroups t=gSearch.search(sear, q, 0, Integer.MAX_VALUE);//设置返回数据
          GroupDocs[] g=t.groups;//获取分组总数
          System.out.println("总数据数"+t.totalHitCount);
          System.out.println("去重复后的数量:"+g.length);
         for(int i=0;i<g.length;i++){
               ScoreDoc []sd=g[i].scoreDocs;
               String str  =sear.doc(sd[0].doc).get(groupField);
               int total=sumcount(str,groupField,sumField,sear);
           //System.out.println("place:"+str+"===>"+"个数:"+g[i].totalHits+);
           System.out.println("place:"+str+"===>"+"个数:"+g[i].totalHits);
               HashMap<String, String> m=new HashMap<String, String>();
               m.put("word", str);
               m.put("wx_count", total+"");
               m.put("wx_total""10000");
               map.add(m);
         }
         read1.close();//关闭资源
           d1.close(); 
        }catch(Exception e){
            e.printStackTrace();
        
        return map;
    }
至此,已经可以简单的实现分组去重统计的功能了,如果业务比较复杂,例如像报表查询,以及一些特定的统计求和功能,这个就可能需要自己写了 
Please read full article from lucene4.x的分组实现

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts