Lucene4.3进阶开发之潇湘夜雨(十七)



Lucene4.3进阶开发之潇湘夜雨(十七)
当所有的文本被语汇成单元后,lucene将会通过一种链式方式,在后台层层把这些信息添加到lucene的各个索引文件里。默认的索引链在DocumentsWriterPerThread中有初始化 
  1. static final IndexingChain defaultIndexingChain = new IndexingChain() {  
  2.   
  3.     @Override  
  4.     DocConsumer getChain(DocumentsWriterPerThread documentsWriterPerThread) {  
  5.       /* 
  6.       This is the current indexing chain: 
  7.  
  8.       DocConsumer / DocConsumerPerThread 
  9.         --> code: DocFieldProcessor 
  10.           --> DocFieldConsumer / DocFieldConsumerPerField 
  11.             --> code: DocFieldConsumers / DocFieldConsumersPerField 
  12.               --> code: DocInverter / DocInverterPerField 
  13.                 --> InvertedDocConsumer / InvertedDocConsumerPerField 
  14.                   --> code: TermsHash / TermsHashPerField 
  15.                     --> TermsHashConsumer / TermsHashConsumerPerField 
  16.                       --> code: FreqProxTermsWriter / FreqProxTermsWriterPerField 
  17.                       --> code: TermVectorsTermsWriter / TermVectorsTermsWriterPerField 
  18.                 --> InvertedDocEndConsumer / InvertedDocConsumerPerField 
  19.                   --> code: NormsConsumer / NormsConsumerPerField 
  20.           --> StoredFieldsConsumer 
  21.             --> TwoStoredFieldConsumers 
  22.               -> code: StoredFieldsProcessor 
  23.               -> code: DocValuesProcessor 
  24.     */  
  25.   
  26.     // Build up indexing chain:  
  27.   
  28.       final TermsHashConsumer termVectorsWriter = new TermVectorsConsumer(documentsWriterPerThread);  
  29.       final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();  
  30.   
  31.       final InvertedDocConsumer termsHash = new TermsHash(documentsWriterPerThread, freqProxWriter, true,  
  32.                                                           new TermsHash(documentsWriterPerThread, termVectorsWriter, falsenull));  
  33.       final NormsConsumer normsWriter = new NormsConsumer();  
  34.       final DocInverter docInverter = new DocInverter(documentsWriterPerThread.docState, termsHash, normsWriter);  
  35.       final StoredFieldsConsumer storedFields = new TwoStoredFieldsConsumers(  
  36.                                                       new StoredFieldsProcessor(documentsWriterPerThread),  
  37.                                                       new DocValuesProcessor(documentsWriterPerThread.bytesUsed));  
  38.       return new DocFieldProcessor(documentsWriterPerThread, docInverter, storedFields);  
  39.     }  
  40.   };
经过一系列链式传递后,索引并不会马上写入磁盘中,而是先写入内存中,然后到一定时机会flush到磁盘上,这实际上也是一种优化策略,从而避免了频繁访问磁盘带来的IO开销,当然我们也可以显式得调用commit方法,将其刷新到磁盘上。 
Please read full article from Lucene4.3进阶开发之潇湘夜雨(十七)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts