Distributed web crawling - Wikipedia, the free encyclopedia
(July 2008) Distributed web crawling is a distributed computing technique whereby Internet search engines employ many computers to index the Internet via web crawling . Such systems may allow for users to voluntarily offer their own computing and bandwidth resources towards crawling web pages. By spreading the load of these tasks across many computers, costs that would otherwise be spent on maintaining large computing clusters are avoided. [1] With this type of policy, a central server assigns new URLs to different crawlers dynamically. This allows the central server to, for instance, dynamically balance the load of each crawler. With dynamic assignment, typically the systems can also add or remove downloader processes. The central server may become the bottleneck, so most of the workload must be transferred to the distributed crawling processes for large crawls.Read full article from Distributed web crawling - Wikipedia, the free encyclopedia
No comments:
Post a Comment