Lucene – using the HitCollector



Lucene – using the HitCollector
I’m running a webshop where products are assigned to categories. Soon there was the requirement to count the search hits in specific categories.

public class CountCollector extends TopFieldDocCollector {
    private Searcher searcher = null;
    private Map countMap = new HashMap();
 
    public CountCollector(Searcher searcher, IndexReader reader, Sort sorter, int maxSearchResults) throws IOException {
        super(reader, sorter, maxSearchResults);
        this.searcher = searcher;
    }
 
    public void collect(int doc, float score) {
        super.collect(doc, score);
        try {
            Document document = searcher.doc(doc);
            if (document != null) {
                    Field[] categoriesDoc = document.getFields("categories");
                    if (categoriesDoc != null && categoriesDoc.length > 0) {
                        for (int i = 0; i < categoriesDoc.length; i++) {
                            if (countMap.containsKey(categoriesDoc[i].stringValue())) {
                                countMap.put(categoriesDoc[i].stringValue(), new Long(countMap.get(categoriesDoc[i].stringValue()) + 1));
                            } else {
                                countMap.put(categoriesDoc[i].stringValue(), new Long("1"));
                            }
                        }
                   }
             }
        }
        catch (CorruptIndexException e) {
            System.err.println("ERROR: " + e.getMessage());
        }
        catch (IOException e) {
            System.err.println("ERROR: " + e.getMessage());
        }
    }
}
First, I extended TopFieldDocCollector which collects the top-sorting documents, returning them as TopFieldDocs. I override the collect method so it counts every hit per category for the search result. Note that the collect method gets called once for every hit. The next question is how to call the search with the custom HitCollector
CountHitCollector collector = new CountHitCollector(searcher, indexReader, sorter, maxSearchResults);
searcher.search(finalQuery, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
The search method returns no Hits Object. The result documents are saved in the HitCollector. Next you can just step through the ScoreDoc array and do
1
Document doc = searcher.doc(hits[i].doc);
for each document and put it in a custom result object.
Please read full article from Lucene – using the HitCollector

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts