All About Programming: UCLA Knowledge Base : Why are Lucene's stored fields so slow to access

UCLA Knowledge Base : Why are Lucene's stored fields so slow to access

Browse By: Problem I have a Lucene index that has some large fields (about 50 KB each) and some small fields (about 50 bytes each). I need to access (iterate) one of the small fields for say 1/10 of the documents. For some reason, such operation is very slow, unreasonably so for such a small field. Cause Lucene provides a number of "policies" of how to access fields of a document. (See class org.apache.lucene.document.FieldSelector .) They specify when and how fields are loaded from the index. It turns out that the default is to load all fields in the document as soon as a Document is requested by, say IndexReader. (See class org.apache.lucene.index.FieldsReader, in particular, how it implements the doc(n, FieldSelector) function.) Therefore, when you load a small field, the large fields are also loaded, causing performance problem if you repeat the operation many times. Solution To use this policy, create a FieldSelector object.

Read full article from UCLA Knowledge Base : Why are Lucene's stored fields so slow to access

UCLA Knowledge Base : Why are Lucene's stored fields so slow to access

No comments:

Post a Comment

Labels

Popular Posts