Lucene高亮关键字与设置高亮串长度



Lucene高亮关键字与设置高亮串长度
Lucene的Hightlight模块可以高亮搜索结果中的关键字,给搜索引擎自定义界面的自由。

如何创建Highlighter

为了创建一个Highlighter,需要一个Formatter和一个Scorer,如下所示:
1
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<font color='red'>""</font>"), new QueryScorer(query));

如何高亮

有多种接口,只举最简单的一个,需要Analyzer,fieldName和fieldContent:
1
highlighter.getBestFragment(analyzer, fieldName, fieldContent)

如何控制结果的长度

通过设置Fragmenter的长度即可实现:
1
2
Fragmenter fragmenter = new SimpleFragmenter(fragmentSize);
highlighter.setTextFragmenter(fragmenter);
           String keyword = "喜欢";
            //使用QueryParser查询分析器构造Query对象
            QueryParser qp = new QueryParser(ver, fieldName, analyzer);
            Query query = qp.parse(keyword);
            System.out.println("Query = " + query);
 
            //搜索相似度最高的5条记录
            TopDocs topDocs = isearcher.search(query, 5);
            System.out.println("命中:" + topDocs.totalHits);
            //输出结果
            ScoreDoc[] scoreDocs = topDocs.scoreDocs;
 
            for (int i = 0; i < Math.min(5, scoreDocs.length); ++i)
            {
                Document targetDoc = isearcher.doc(scoreDocs[i].doc);
                System.out.print(targetDoc.getField("title").stringValue());
                System.out.println(" , " + scoreDocs[i].score);
 
                String text = targetDoc.get(fieldName);
                System.out.println(displayHtmlHighlight(query, analyzer, fieldName, text, 200));
            }
    static String displayHtmlHighlight(Query query, Analyzer analyzer, String fieldName, String fieldContent, int fragmentSize) throws IOException, InvalidTokenOffsetsException
    {
        //创建一个高亮器
        Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<font color='red'>""</font>"), new QueryScorer(query));
        Fragmenter fragmenter = new SimpleFragmenter(fragmentSize);
        highlighter.setTextFragmenter(fragmenter);
        return highlighter.getBestFragment(analyzer, fieldName, fieldContent);
    }
Please read full article from Lucene高亮关键字与设置高亮串长度

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts