Cracking The Code.Did You Know Efficient & Working Code Really Matters: Design The Data Structure & Algorithm to Show That Their Exist a Path (Connection Between Two Person on Facebook)



Cracking The Code.Did You Know Efficient & Working Code Really Matters: Design The Data Structure & Algorithm to Show That Their Exist a Path (Connection Between Two Person on Facebook)

How would you design the data structures for a very large social network (Facebook, LinkedIn, etc)? Describe how you would design an algorithm to show the connection,
or path, between two people (e.g., Me -> Bob -> Susan -> Jason -> You).


Approach:
Forget that we're dealing with millions of users at first. Design this for the simple case.
We can construct a graph by assuming every person is a node and if there is an edge between
two nodes, then the two people are friends with each other.
class Person {
Person[] friends;
// Other info
}
If I want to find the connection between two people, I would start with one person and do a simple breadth first search.
But... oh no! Millions of users!
When we deal with a service the size of Orkut or Facebook, we cannot possibly keep all of our data on one machine. That means that our simple Person data structure from above doesn't quite work—our friends may not live on the same machine as us. Instead, we can replace our list of friends with a list of their IDs, and traverse as follows:
1. For each friend ID: int machine_index = lookupMachineForUserID(id);
2. Go to machine machine_index
3. Person friend = lookupFriend(machine_index);
There are more optimizations and follow up questions here than we could possibly discuss, but here are just a few thoughts.
Optimization: Reduce Machine Jumps
Jumping from one machine to another is expensive. Instead of randomly jumping from machine
to machine with each friend, try to batch these jumps—e.g., if 5 of my friends live on one machine, I should look them up all at once.
Optimization: Smart Division of People and Machines
People are much more likely to be friends with people who live in the same country as them. Rather than randomly dividing people up across machines, try to divvy them up by country, city, state, etc. This will reduce the number of jumps.
Question: Breadth First Search usually requires "marking" a node as visited. How do you do that in

Read full article from Cracking The Code.Did You Know Efficient & Working Code Really Matters: Design The Data Structure & Algorithm to Show That Their Exist a Path (Connection Between Two Person on Facebook)


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts