Segments File | segments.gen, segments_N | Stores information about a commit point |
The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or, a writer is in the process of committing, or a custom
IndexDeletionPolicy
is in use). This file lists each segment by name and has details about the codec and generation of deletes.
There is also a file segments.gen. This file contains the current generation (the _N in segments_N) of the index. This is used only as a fallback in case the current generation cannot be accurately determined by directory listing alone (as is the case for some NFS clients with time-based directory cache expiration). This file simply contains an
Int32
version header (FORMAT_SEGMENTS_GEN_CURRENT
), followed by the generation recorded as Int64
, written twice.Segment Info | .si | Stores metadata about a segment |
Only for reading old 4.0-4.5 segments, and supporting IndexWriter.addIndexes
Compound File | .cfs, .cfe | An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles. |
- .cfs: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.
- .cfe: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file.
Fields | .fnm | Stores information about the fields |
Field names are stored in the field info file, with suffix .fnm.
FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,DocValuesGen,Attributes> FieldsCount,Footer
Field Index | .fdx | Contains pointers to field data |
Lock File
The write lock, which is stored in the index directory by default, is named "write.lock". If the lock directory is different from the index directory then the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix derived from the full path to the index directory. When this file is present, a writer is currently modifying the index (adding or removing documents). This lock file ensures that only one writer is modifying the index at a time.- In version 4.8, checksum footers were added to the end of each index file for improved data integrity. Specifically, the last 8 bytes of every index file contain the zlib-crc32 checksum of the file.
Read full article from org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)
No comments:
Post a Comment