Lucene4.3开发之第九步之渡劫中期(九)



Lucene4.3开发之第九步之渡劫中期(九)
Lucene4.x之后的所有索引格式如下所示: 

文件名后缀描述
Segments Filesegments.gen, segments_N存储段文件的提交点信息
Lock Filewrite.lock文件锁,保证任何时刻只有一个线程可以写入索引
Segment Info.si存储每个段文件的元数据信息
Compound File.cfs, .cfe复合索引的文件,在系统上虚拟的一个文件,用于频繁的文件句柄
Fields.fnm存储域文件的信息
Field Index.fdx存储域数据的指针
Field Data.fdt存储所有文档的字段信息
Term Dictionary.timterm字典,存储term信息
Term Index.tipterm字典的索引文件
Frequencies.frq词频文件,包含文档列表以及每一个term和其词频
Positions.prx位置信息,存储每个term,在索引中的准确位置
Norms.nrm.cfs, .nrm.cfe存储文档和域的编码长度以及加权因子
Per-Document Values.dv.cfs, .dv.cfe编码除外的额外的打分因素,
Term Vector Index.tvxterm向量索引,存储term在文档中的偏移距离
Term Vector Documents.tvd包含每个文档向量的信息
Term Vector Fields.tvf存储filed级别的向量信息
Deleted Documents.del存储索引删除文件的信息
Please read full article from Lucene4.3开发之第九步之渡劫中期(九)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts