Lucene学习笔记之Inverted Index (Inverted index in Lucene)



Lucene学习笔记之Inverted Index (Inverted index in Lucene)
Lucene并不是称为倒排表为Inverted List,它是用Posting List来称呼,所以我们找文件的话,要找跟Posting List相关的。你会发现最相关的类是Lucene41PostingsWriter,这个文件还蛮庞大的,有几个要点我们可以注意一下
  1. 没有用Vector或者ArrayList等动态的数据结构,而是用数组,目的是提高效率,数组的最大长度是
 Inverted list里面的元素是怎么排序的?
 Inverted Index在Segment里头是怎么存储的?
  • 随着不断添加document进来,很多inverted list也会跟着变长,这对存储是件挺麻烦的事情,它不像数据库,每一个attribute设个最长的长度,超过就报错,但Lucene显然不能这么做,这方面的I/O性能Lucene是怎么保证的?
Please read full article from Lucene学习笔记之Inverted Index (Inverted index in Lucene)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts