All About Programming: [LUCENE-3003] Move UnInvertedField into Lucene core

It is inefficient - but I never saw a way around it since the lists are all being built in parallel (due to the fact that we are uninverting).

Lucene's indexer (TermsHashPerField) has precisely this same problem
– every unique term must point to two (well, one if omitTFAP)
growable byte arrays. We use "slices" into a single big (paged)
byte[], where first slice is tiny and can only hold like 5 bytes, but
then points to the next slice which is a bit bigger, etc.

We could look @ refactoring that for this use too...

Though this is "just" the one-time startup cost.

Another small & easy optimization I hadn't gotten around to yet was to lower the indexIntervalBits and make it configurable.

I did make it configurable to the Lucene class (you can pass it in to
ctor), but for Solr I left it using every 128th term.

Another small optimization would be to store an array of offsets to length-prefixed byte arrays, rather than a BytesRef[]. At least the values are already in packed byte arrays via PagedBytes.

Both FieldCache and docvalues (branch) store an array-of-terms like
this (the array of offsets is packed ints).

We should also look at using an FST, which'd be the most compact but
the ord -> term lookup cost goes up.

Anyway I think we can pursue these cool ideas on new [future]

Read full article from [LUCENE-3003] Move UnInvertedField into Lucene core - ASF JIRA

[LUCENE-3003] Move UnInvertedField into Lucene core - ASF JIRA

No comments:

Post a Comment

Labels

Popular Posts