Using Payloads with Solr (4.x) | Tech Collage
# 1 QueryParsing
# 1 QueryParsing
Wrapping your specific query terms with ‘PayloadTermQuery’ object in your query parser’s parse() method wouldn’t work. Rather, you should also override SolrQueryParser.getFieldQuery() method, like in the sample below, to identify your payloaded terms.
@Override
protected Query getFieldQuery(String field, String queryText, boolean quoted) throws SyntaxError {
SchemaField sf = this.schema.getFieldOrNull(field);
if (sf != null && sf.getType().getTypeName().equalsIgnoreCase("payloads")) {
Term t = new Term(field, queryText);
Query q = new PayloadTermQuery(t, new MaxPayloadFunction(), false);
return q;
}
return super.getFieldQuery(field, queryText, quoted);
}
In the above sample, a field of type ‘payloads’ is considered a payloaded field (you could give a different name), and so the wrapping query is accordingly changed. Only if the above is done, your implementation of Similarity’s scorePayload() function would be invoked.
This information on overriding ‘getFieldQuery()’ is of course available in this wiki link, Payloads,
#2 Scoing using payloads
Talking about scorePayload(), the methods’s new signature in Lucene 4.1 is all the more confusing compared to what was available before.
@Override
public float scorePayload(int doc, int start, int end, BytesRef payload) {
if (payload != null) {
float x = PayloadHelper.decodeFloat(payload.bytes, payload.offset);
return x;
}
return 1.0F;
}
The payload is available as a ‘BytesRef’ instance (unlike a byte array as in previous Lucene versions), and the developer is challenged to find out what method to invoke on that object to get the payload score! Developers may be tempted to play with ‘utf8ToString()’ method but beware. That isn’t the solution. Just note that the member variable ‘bytes’, which is a byte array, is of public scope, and that exactly carries the score. IMHO, the previous idea of a ‘byte []’ argument seemed much safer, and readable.
#3 Adding payloaded documents to index
Quite recently in the same article, I had written in this section that if we try to index payloaded documents as a collection using ‘add()’ or ‘addBeans’, then the payload value pertaining to the first document alone is considered, and the same value is taken as score for other documents in the collection. So, I had suggested to add documents one by one, and commit each time (as given below).
for (D doc : docsIterator) {
server.addBean(doc);
server.commit();
}
Unfortunately, it is a big misunderstanding among a few Lucene-using developers like me, and I saw some forums also discussing about this idea.
So, I have re-edited this section for the better!
So, I have re-edited this section for the better!
There is no problem adding payloaded documents in bulk, but one has to be careful to include ‘payload.offset’ while implementing scorePayload() (as in section #2). Only then, the current document’s payload value would be considered correctly.
Read full article from Using Payloads with Solr (4.x) | Tech Collage
No comments:
Post a Comment