Sharded architecture / pagination problem · Issue #175 · bloomberg/lucene-solr



Sharded architecture / pagination problem · Issue #175 · bloomberg/lucene-solr

But because the rescoring happens locally in each shard, I believe we have still problems with pagination in a shared environment.

Of course it is a minor issue that will appear only if you rerank few docs per shard or in deep pagination examples.

The problem resides in the fact that the aggregator of the results from the shards, will merge search results from different shards and then rank according to the score.
Each shard will return results in the proper order ( first reranked docs and following the original scored ones).
But score wise, rescored docs can have a smaller score than the original scored.

When the aggregation happens, original scored ones can surpass in the ranking the rescored ones.
I discussed briefly with Diego and his suggestion is to normalise the scores to make the re--ranked ones always with an higher score of the original scored.
I do agree this can be a possible solution to be consistent and maintain the ordinal nature of the score in Solr, that we are currently breaking at the reRankDocs point.


Read full article from Sharded architecture / pagination problem · Issue #175 · bloomberg/lucene-solr


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts