[CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎 - Grandyang - 博客园

[CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎

10.7 Imagine a web server for a simplified search engine. This system has 100 machines to respond to search queries, which may then call out using processSearch(string query) to another cluster of machines to actually get the result. The machine which responds to a given query is chosen at random, so you can not guarantee that the same machine will always respond to the same request. The method processSearch is very expensive. Design a caching mechanism for the most recent queries. Be sure to explain how you would update the cache when data changes.

这道题说假设有一个简单搜索引擎的网络服务器，系统共有100个机子来响应检索，可以用processSearch(string query)来得到其他机子上的结果，每台机子响应检索是随机的，不保证每个机子都会响应到同一个请求。processSearch方法非常昂贵，设计一个缓存机制来应对近期检索。根据书中描述，我们先来做一些假设：

1. 与其说根据需要调用processSearch，倒不如设定所有的检索处理发生在第一个被调用的机子上。

2. 我们需要缓存的检索是非常大量的。

3. 机器之间的调用很快。

4. 检索的结果是一个有序的URL链表，每个URL由50个字符的标题和200个字符的概要组成。

5. 最常访问的检索会一直出现的缓存器中。

系统需求：

主要需要实现下列两个功能：

1. 高效查找当给定了一个关键字时

2. 新数据会代替旧数据的位置

我们还需要更新和清楚缓存当搜索结果改变了。由于一些检索非常的常见病永久的在缓存器中，我们不能等缓存器自然失效。

步骤一：设计单个系统的存存器

我们可以混合使用链表和哈希表来实现，我们建立一个链表，当某个节点被访问了，自动将其移到开头，这样链表的末尾就是最老的数据。我们用哈希表来建立检索和链表中节点的映射，这样不仅可以让我们高效的返回缓存的结果，而且可以把节点移到链表前段，参见代码如下：

Read full article from [CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎 - Grandyang - 博客园

[CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎 - Grandyang - 博客园

[CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎

No comments:

Post a Comment

Labels

Popular Posts