All About Programming: org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)

Segments File segments.gen, segments_N Stores information about a commit point

A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.

The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or, a writer is in the process of committing, or a custom IndexDeletionPolicy is in use). This file lists each segment by name and has details about the codec and generation of deletes.

There is also a file segments.gen. This file contains the current generation (the _N in segments_N) of the index. This is used only as a fallback in case the current generation cannot be accurately determined by directory listing alone (as is the case for some NFS clients with time-based directory cache expiration). This file simply contains an Int32 version header (FORMAT_SEGMENTS_GEN_CURRENT), followed by the generation recorded as Int64, written twice.

Segment Info .si Stores metadata about a segment

Deprecated.

Only for reading old 4.0-4.5 segments, and supporting IndexWriter.addIndexes

Compound File .cfs, .cfe An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.

.cfs: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.
.cfe: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file.

Fields .fnm Stores information about the fields

Field names are stored in the field info file, with suffix .fnm.

FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,DocValuesGen,Attributes> ^FieldsCount,Footer

Field Index .fdx Contains pointers to field data

Lock File

The write lock, which is stored in the index directory by default, is named "write.lock". If the lock directory is different from the index directory then the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix derived from the full path to the index directory. When this file is present, a writer is currently modifying the index (adding or removing documents). This lock file ensures that only one writer is modifying the index at a time.

In version 4.8, checksum footers were added to the end of each index file for improved data integrity. Specifically, the last 8 bytes of every index file contain the zlib-crc32 checksum of the file.

Read full article from org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)

org.apache.lucene.codecs.lucene49 (Lucene 4.9.0 API)

Lock File

No comments:

Post a Comment

Labels

Popular Posts