What is the best way to use Apache Solr for relational JOIN? - Quora



What is the best way to use Apache Solr for relational JOIN? - Quora

Why not run a single-use script that does the joins on the SOLR index to replace the authorID with the author name, or add the field if you need both?  Updating 10,000,000+ documents will take a while, but it's certainly doable.  Then you can have new documents indexed with both fields as well, doing the JOIN one at a time.

In general, SOLR works best with completely denormalized data.

EDIT:
Okay, if it's more than just authors' names, you could still denormalize the data, repeating your documents for each of the related author documents.  However, that's an ugly solution, and SOLR 4 does have "JOIN" capability.  You'd first need to at least have a pipeline where author data is indexed into SOLR in real-time.  Then yes, I'd use ElasticSearch to easily handle the complex SOLR queries involving JOIN-like operations.

Read full article from What is the best way to use Apache Solr for relational JOIN? - Quora


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts