Lucene4.3开发之第八步之渡劫初期(八)



Lucene4.3开发之第八步之渡劫初期(八)
要使用高亮,首先就得从索引时开始,因为需要高亮的字段,需要准确的获取位置信息,以及一些偏移量,如果信息不准确,那么可能在结果中,就会出现一些莫名其妙的错位,反映到网页上就是标注了不该标注的字,没有标注该标的内容,所以这一点还是需要注意一下,在索引的时候,我们需要使用项向量记录各个token的位置信息
  1. FieldType type=new FieldType(TextField.TYPE_STORED);   
  2.      type.setStoreTermVectorOffsets(true);//记录相对增量  
  3.      type.setStoreTermVectorPositions(true);//记录位置信息  
  4.      type.setStoreTermVectors(true);//存储向量信息  
  5.      type.freeze();//阻止改动信息  
  6.      Field field=new Field("字段名""值", type);//示例  
需要进行高亮的内容,是一定要存储的,可能有一些比较大的文本,会比较占索引空间,从而影响检索性能,当然我们也可以使用外部存储,关系型数据库,nosql什么的都可以,此时,高亮可能就需要做另一些处理了.

释义
SimpleHTMLFormatter常用的格式化Html标签器,提供一个构造函数传入高亮颜色标签,默认使用黑色
TokenSources提供静态方法,支持从数据源中获取TokenStream,进行token处理
Highlighter负责获取匹配上的高亮片段
QueryScorer对命中结果进行评分操作
Fragmenter将原始字符串拆分成独立的片段
NullFragmenter对较短的域进行整体高亮
FastVectorHighlighter基于快速高亮
Encoder提供一些实现类,对html文本操作,如,去掉一些特殊匹配符号<,>  and so on,及一些其他的非ASCII特殊字符。
Please read full article from Lucene4.3开发之第八步之渡劫初期(八)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts