Elasticsearch.org Lucene's Handling Of Deleted Documents | Blog | Elasticsearch
Michael McCandless January 30, 2015 When a document is deleted or updated (= delete + add), Apache Lucene simply marks a bit in a per-segment bitset to record that the document is deleted. All subsequent searches simply skip any deleted documents. It is not until segments are merged that the bytes consumed by deleted documents are reclaimed. Likewise, any terms that occur only in deleted documents (ghost terms) are not removed until merge. This approach is necessary because it would otherwise be far too costly to update Lucene's write-once index data structures and aggregate statistics for every document deletion, but it has some implications: Deleted documents tie up disk space in the index. In-memory per-document data structures, such as norms or field data, will still consume RAM for deleted documents. Search throughput is lower, since each search must check the deleted bitset for every potential hit. More on this below. Aggregate term statistics, used for query scoring,Read full article from Elasticsearch.org Lucene's Handling Of Deleted Documents | Blog | Elasticsearch
No comments:
Post a Comment