All About Programming: CheckIndex for the rescue

CheckIndex for the rescue | Solr Enterprise Search
What is CheckIndex ?

CheckIndex is a tool available in the Lucene library, which allows you to check the files and create new segments that do not contain problematic entries. This means that this tool, with little loss of data is able to repair a broken index, and thus save us from having to restore the index from the backup (of course if we have it) or do the full indexing of all documents that were stored in Solr.

In addition, it is worth knowing that the tool analyzes the index byte by byte, and thus for large indexes the time of analysis and repair may be large. It is important not to run the tool with the -fix option at the moment when it is used by Solr or other application based on the Lucene library. Finally, be aware that the launch of the tool in repairing mode may result in removal of some or all documents that are stored in the index.

But what happens in the case of the broken index? There is only one way to see it – let’s try. So, I broke one of the index files and ran the CheckIndex tool. The following appeared on the console after I’ve run the CheckIndex tool:

As you can see, all the 19 documents that were in the index have been removed. This is an extreme case, but you should realize that this tool might work like this.

test: open reader.........FAILED

12WARNING: fixIndex() would remove reference to this segment; full exception:

13org.apache.lucene.index.CorruptIndexException: did not read all bytes from file"_0.fnm": read 150 vs size 152

14at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:370)

15at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)

16at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:119)

17at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652)

18at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605)

19at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:491)

20at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

21 

22WARNING: 1 broken segments (containing 19 documents) detected

23WARNING: 19 documents will be lost

24 

25NOTE: will write new segments file in 5 seconds; this will remove 19 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!

265...

274...

283...

292...

301...

31Writing...

32OK

33Wrote new segments file "segments_3"

Read full article from CheckIndex for the rescue | Solr Enterprise Search

CheckIndex for the rescue | Solr Enterprise Search

CheckIndex for the rescue | Solr Enterprise Search
What is CheckIndex ?

No comments:

Post a Comment

Labels

Popular Posts

CheckIndex for the rescue | Solr Enterprise Search

CheckIndex for the rescue | Solr Enterprise SearchWhat is CheckIndex ?

No comments:

Post a Comment

Labels

Popular Posts

CheckIndex for the rescue | Solr Enterprise Search
What is CheckIndex ?