generate group by email address | Hello World



generate group by email address | Hello World

第二题是有这么一个class Contact,里面有一个String的name,和一个List
装着email address,是这个Contact有的address,用一个list装着是因为一个人有可
能有多个email,现在给你an array of Contact,比如
#1 Johny [john@gmail.com]
#2 Mary [mary@gmail.com]
#3 Johnnathen [john@yahoo.com]
#4 John [john@gmail.com, john@yahoo.com, john@hotmail.com]
#5 Bob [bob@gmail.com]
现在我们知道如果email address相同的话,那么就说明是同一个人,现在要做的是根
据这些email address,把同一个contact给group起来,生成一个List<List>
,比如我们就可以知道#1,#3,#4是同一个人。注意不能根据名字来group,因为有可
能有相同名字的人,或者同一个人有可能有不同的名字来注册之类的。

This is a classic disjoint set problem. Every user has a (unique) number, and a linked list of emails that associated with the number. We want to group all the contacts that shares same email addresses. (1, 3, and 4 is a group for example).

We can first create a map that maps each email address to a set of users. Then we iterate through the original array, and join those sets which shares same email address.
given an example, if we have
1 -> e1, e2.
2 -> e2, e3,
The hash map will be
e1 -> 1,
e2 -> 1, 2
e3 -> 2,

Then we go through the original list and join the sets under e1 and e2, then e2 and e3, so we will have set 1 and 2.

This can also be solved using a graph like algorithm. we look at the all the name and email addresses as nodes, and the relation ship between them as edges. We can simply form a graph and then run a breath first search and all the names in a connected components will be grouped in one group.


Read full article from generate group by email address | Hello World


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts