All About Programming: Some system design questions

Some system design questions | Hello World

1. 设计文件系统

2. 数据结构for spreadsheet

3. 一个app需要用cache，怎么实现thread safe

4. social network, billions id, every id has about 100 friends roughly, what is
max connections between any two ppls. write algorithm to return min
connections between two ids: int min_connection(id1, id2)

you can call following functions
expand(id) return friends list of id
expandall(list) return friends union of all the ids in the list
intersection(list1, list2) return intersection
removeintersection(list1, list2)

5. Open google.com, you type some words in the edit box to search something, it will return lots of search results. Among these returning results (that is, website link), you may only CLICK some results that are interesting to you. The system will record the "CLICK"action. Finally, you will have the search results (i.e. url) and "CLICK" informatin at hand.
Question: how do you find the similarity of these searching?

6. 如何找出最热门的话题(根据tweets)。如果一个话题一直热门，我们不想考虑怎么办

7. Discuss design challenges of a distributed web crawler running on commercial PCs. How to utilize network bandwidth of those PCs efficiently?

8. Design a site similar to tinyurl.com

9. large log file,含有 customer id, product id, time stamp想得到在某一天中某个custom看网页的次数1. 足够memory 2. limited memory

10. 设计一个actor和movie的数据库的schema, 支持从movie得到它的actors和从actor得到ta出现过的moive (Google, phone, 2006)

11. 某建筑有五十层高,打算装俩电梯,设计该电梯系统

12. how to design facebook's newsfeed?

13. 一个文件里n行m列，每行是一个record，每列一个feature，你时不时要按不同feature排序和查找。不能用数据库，文件大小内存能装下，数据结构和算法不限，语言不限，给出你最好的办法。

14. Design online game

15. static 变量用来在整个class中共享数据.基于此，各种synchronization技术，以及busy waiting的优缺点，啥时候要用基于busy waiting的 spinlock主要是基于性能的探讨。如果有一个应用程序运行时没有达到timing constraint，你如何去分析问题出在哪儿，可以用什么工具或者技术。

16. 设计题，有一个多台机器构成的cluster。现在有大量公司的数据文件（并有多个备份）。如果设计一个算法，使得每台机器尽量均衡的使用，并且每个公司文件的不同copy不能存在于同一台机器上。主要的Idea就是用round-robin的方式assign每个公司的原数据文件到一台机器，再结合使用hashtable。 Interviewer提到我的解法正是他现在在使用的解法。

17. Design a class providing lock function which provide lock only if it sees there are no possible deadlocks.

18. 设计一个分布式文件系统，给定path name，可以读写文件。具体的system design这里就不提了。其中一个细节是，给定path name，怎么知道哪个node拥有这个文件。我提出需要实现一个lookup function，它可以是一个hash function，也可以是一个lookup table。如果是lookup table，为了让所有client sync，可以考虑额外做一个lookup cluster。然后Interviewer很纠结，既然可以用hash function，为什么还搞得那么复杂。我就告诉他hash function的缺点。假定一开始有N个node，hash function把M个文件uniformly distribute到N个node上。某天发现capacity不够，加了一个node。首先，要通知所有的client machine，configuration 改变了。如果不想重启client machine的process，这不是一个trivial job。其次，文件到node的mapping也变了。比如，本来按照hash function，一个文件是放在node 1。加了一个node 后，它可能就map到node 2了。平均来说，N/(N
+1)的文件需要move到新的node。这个data migration还是很大的。然后我就提出一些hash function的design，可以减少data migration。最后他提了一个问题，说要实现一个function，要统计distributed file system所有目录的大小。前提是，一个目录下的文件可能放在不同的node上。我说这个不就是在每个node上统计，然后发到一个merge吗。他说对，但是又问用什么data structure来表示。我说这就是hash table，key就是directory name，value就是大小。因为directory本身是树结构，这个hash table的key可以用tree来组织。最后让我实现一个function，把我说得这个data structure serialize成byte array。因为这个byte array就是网络传输的data。我用了depth first traverse。不过等我程序写完，才发现，用breath first traverse会更方便，code也会很简洁

19. 超大图的存储问题

20. 给个Lock w/ two atomic method lock() and unlock()，请用lock实现一个文件读写的系统，要求：
1: reader blocks writer;
2: writer blocks reader;
3: writer blocks writer;

21。设计一个web cache server，假设存储网页数量是10个billion，打算怎么设计

22.你可以得到网站访问记录，每条记录有user IP, 写一个程序，要随时能算出过去5分钟内访问次数最多的1000个IP. 这个好像跟着这个rolling window 的precision 有关，所以我们暂且定为5秒钟update 一次window

23. Design free and malloc.

24. how to design data structures for a facebook network and how to design an algorithm to find connection? How to optimize it if data is distributed into multiple computers?

25. design a deck class and member function to randomly select a card from those cards which haven't been selected before. (You can assume the number of this function call will never be larger than the number of cards) For example, we have a deck of four card: 1,2,3,4. First it may select 3, then next time it should randomly select one from 1,2,4… And design a member function to reset.

26. google search design problem. How to distribute data and how to design backup system

27. 设计一个online chat system

28. design bit.ly url shortening web service。算法设计，后端存储，中间层cache，前端load balance，最后是web analytics。

29. Design and implement an algorithm that would correct typos: for example, if an extra letter is added, what would you do?

30. Suppose there are 2 persons A and B on FB . A should be able to view the pictures of B only if either A is friend of B or A and B have at least one common friend . The interviewer discussed it for nearly 30 minutes . The discussion mainly included following points：
1. How are you going to store the list of friends for a given user?
2. File system vs DB
3. Given list of friends of 2 users, how are you going to find common friends?
4. If you are going to store the friends in DB then how will the table look like?
5. How many servers do you need?
6. How are you going to allocate work to servers?
7. How many copies of data will you need?
8. What problems will you face if you are maintaining multiple copies of data.

31. design structure for auto completion

32. 如何实现search suggestions。

33. 设计fb的系统支持like那个button

34. design 股票#，time，price；
-设计一个client side显示股票信息，给出尽可能多的user case
-在给出的user case里面，怎么设计客户端，使得客户段性能提高
-怎么设计server端
-数据如何传输
-server端如何保存数据
-怎么设计database table保存数据
-不用index怎么提高数据查询速度
-database是怎么实现数据查询的（要求从database implementation角度解释）

Read full article from Some system design questions | Hello World

Some system design questions | Hello World

No comments:

Post a Comment

Labels

Popular Posts