Using Payloads with Solr (4.x) | Tech Collage



Using Payloads with Solr (4.x) | Tech Collage
# 1 QueryParsing
Wrapping your specific query terms with ‘PayloadTermQuery’ object in your query parser’s parse() method wouldn’t work. Rather, you should also override SolrQueryParser.getFieldQuery() method, like in the sample below, to identify your payloaded terms.
@Override
protected Query getFieldQuery(String field, String queryText, boolean quoted) throws SyntaxError {
SchemaField sf = this.schema.getFieldOrNull(field);
if (sf != null && sf.getType().getTypeName().equalsIgnoreCase("payloads")) {
Term t = new Term(field, queryText);
Query q = new PayloadTermQuery(t, new MaxPayloadFunction(), false);
return q;
}
return super.getFieldQuery(field, queryText, quoted);
}
In the above sample, a field of type ‘payloads’ is considered a payloaded field (you could give a different name), and so the wrapping query is accordingly changed. Only if the above is done, your implementation of Similarity’s scorePayload() function would be invoked.
This information on overriding ‘getFieldQuery()’ is of course available in this wiki link, Payloads
#2 Scoing using payloads
Talking about scorePayload(), the methods’s new signature in Lucene 4.1 is all the more confusing compared to what was available before.
@Override
public float scorePayload(int doc, int start, int end, BytesRef payload) {
if (payload != null) {
float x = PayloadHelper.decodeFloat(payload.bytes, payload.offset);
return x;
}
return 1.0F;
}
The payload is available as a ‘BytesRef’ instance (unlike a byte array as in previous Lucene versions), and the developer is challenged to find out what method to invoke on that object to get the payload score! Developers may be tempted to play with ‘utf8ToString()’ method but beware. That isn’t the solution. Just note that the member variable ‘bytes’, which is a byte array, is of public scope, and that exactly carries the score. IMHO, the previous idea of a ‘byte []’ argument seemed much safer, and readable.
#3 Adding payloaded documents to index
Quite recently in the same article, I had written in this section that if we try to index payloaded documents as a collection using ‘add()’ or ‘addBeans’, then the payload value pertaining to the first document alone is considered, and the same value is taken as score for other documents in the collection. So, I had suggested to add documents one by one, and commit each time (as given below).

for (D doc : docsIterator) {
server.addBean(doc);
server.commit();
}
Unfortunately, it is a big misunderstanding among a few Lucene-using developers like me, and I saw some forums also discussing about this idea.
So, I have re-edited this section for the better!
There is no problem adding payloaded documents in bulk, but one has to be careful to include ‘payload.offset’ while implementing scorePayload() (as in section #2). Only then, the current document’s payload value would be considered correctly.
Read full article from Using Payloads with Solr (4.x) | Tech Collage

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts