Solr 4.7 – efficient deep paging



Solr 4.7 – efficient deep paging
Long, long time ago, we described a problem called deep paging. To keep things short – the deeper you want to go in the results, the slower the query will be. This is because Solr needs to prepare the data from the beginning for each query. Until Solr 4.7 there wasn’t a good solution for that problem. With the recently released Solr version, we got a possibility of using so called cursor to drastically improve performance of deep paging.

The deep paging problem is quite easy to define. To return search results Solr must prepare an in-memory structure and return part of it. Returning the part of the structure is simple, if that part comes from the beginning of the structure. However, if we want to return page number 10.000 (where we return 20 results per page) Solr needs to prepare a structure containing minimum of 200.000 elements (10.000 * 20). You see that it not only takes time, but also memory.
The good thing is, that with the release of Solr 4.7 the situation had changed – the cursor has been introduced. Cursor is a logic structure, that doesn’t require its state to be stored on the server side. Cursor contains informationabout storing and lest document returned in the results. Because of that, Solr doesn’t need to start search from beginning each time we want to get next page of results. It results in drastic performance improvement when using cursor and going deep into results.

Cursor usage is very simple. To tell Solr to return cursor, in the first query we need to pass an additional parameter – cursorMark=*. In result, apart from documents, we will get a cursor identifier returned in the nextCursorMarkparameter.
curl 'localhost:8983/solr/select?q=*:*&rows=1&sort=score+desc,id+asc&cursorMark=*'

First of off, we either omit the start parameter or we set it to 0. The rows parameter can take values we need, there is no limitation on it. Of course, we passed the cursorMark=*parameter, to tell Solr that we want the cursor to be used. The final thing we did is sorting definition. We need to define sorting for cursor to be working, one that will tell cursor how to behave. That’s why we needed to overwrite default sorting and include sorting not only by score, by also by document identifier.
 <str name="nextCursorMark">AoIIP4AAACgwNTc5QjAwMg==</str>
curl 'localhost:8983/solr/select?q=*:*&rows=1&sort=score+desc,id+asc&cursorMark=AoIIP4AAACgwNTc5QjAwMg=='

Please read full article from Solr 4.7 – efficient deep paging

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts