All About Programming: Sorting with Lucene

Sorting with Lucene - Ayende @ Rahien
the default sorting (by relevancy) is really simple. All you need is to get the relevant score for a query, then you shove the results through a heap with a specified size. The heap will take care of maintain the top results.

But the question is, how do you do sorting on a field value? The answer is, not easily.

GetStringIndex() does something very interesting. I returns a string index, which gives us:

A string array containing all the distinct (sorted) value for this index.
A int array with all the documents in the index, with the position of the value of that field in the string value array

Now we can compare fields by their field position on the field, which give us pretty good sorting. Unfortunately, this also require us to load all the values to memory. Let us see another example, which would probably be easier to follow:

Sorting by an integer is done like this:

Get an array (whose size match the number of documents), We can then sort things easily because accessing the relevant field value only require us to have the document id to index into the array.

The reason Lucene does this is that it uses an inverted index, and it has no easy way of going from the field values to the list documents it has. So it is easier to read all the values into memory and work with them there. I don’t like it, but off hand, I can’t think of a better way to handle this.

Read full article from Sorting with Lucene - Ayende @ Rahien

Sorting with Lucene - Ayende @ Rahien

No comments:

Post a Comment

Labels

Popular Posts