CheckIndex for the rescue | Solr Enterprise Search



CheckIndex for the rescue | Solr Enterprise Search
What is CheckIndex ?

CheckIndex is a tool available in the Lucene library, which allows you to check the files and create new segments that do not contain problematic entries. This means that this tool, with little loss of data is able to repair a broken index, and thus save us from having to restore the index from the backup (of course if we have it) or do the full indexing of all documents that were stored in Solr.

In addition, it is worth knowing that the tool analyzes the index byte by byte, and thus for large indexes the time of analysis and repair may be large. It is important not to run the tool with the -fix option at the moment when it is used by Solr or other application based on the Lucene library. Finally, be aware that the launch of the tool in repairing mode may result in removal of some or all documents that are stored in the index.

But what happens in the case of the broken index? There is only one way to see it – let’s try. So, I broke one of the index files and ran the CheckIndex tool. The following appeared on the console after I’ve run the CheckIndex tool:

As you can see, all the 19 documents that were in the index have been removed. This is an extreme case, but you should realize that this tool might work like this.

testopen reader.........FAILED
12WARNING: fixIndex() would remove reference to this segment; full exception:
13org.apache.lucene.index.CorruptIndexException: did not read all bytes from file"_0.fnm"read 150 vs size 152
14at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:370)
15at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)
16at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:119)
17at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652)
18at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605)
19at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:491)
20at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
21 
22WARNING: 1 broken segments (containing 19 documents) detected
23WARNING: 19 documents will be lost
24 
25NOTE: will write new segments file in 5 seconds; this will remove 19 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
265...
274...
283...
292...
301...
31Writing...
32OK
33Wrote new segments file "segments_3"
Read full article from CheckIndex for the rescue | Solr Enterprise Search

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts