org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)





Segments Filesegments.gen, segments_NStores information about a commit point
A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.
The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or, a writer is in the process of committing, or a custom IndexDeletionPolicy is in use). This file lists each segment by name and has details about the codec and generation of deletes.
There is also a file segments.gen. This file contains the current generation (the _N in segments_N) of the index. This is used only as a fallback in case the current generation cannot be accurately determined by directory listing alone (as is the case for some NFS clients with time-based directory cache expiration). This file simply contains an Int32 version header (FORMAT_SEGMENTS_GEN_CURRENT), followed by the generation recorded as Int64, written twice.

Segment Info.siStores metadata about a segment
Deprecated. 
Only for reading old 4.0-4.5 segments, and supporting IndexWriter.addIndexes
Compound File.cfs, .cfeAn optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.


  • .cfs: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.
  • .cfe: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file.

Fields.fnmStores information about the fields
Field names are stored in the field info file, with suffix .fnm.
FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,DocValuesGen,Attributes> FieldsCount,Footer

Field Index.fdxContains pointers to field data

Lock File

The write lock, which is stored in the index directory by default, is named "write.lock". If the lock directory is different from the index directory then the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix derived from the full path to the index directory. When this file is present, a writer is currently modifying the index (adding or removing documents). This lock file ensures that only one writer is modifying the index at a time.



  • In version 4.8, checksum footers were added to the end of each index file for improved data integrity. Specifically, the last 8 bytes of every index file contain the zlib-crc32 checksum of the file.

Read full article from org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts