If you were designing a web crawler, how would you avoid getting into infinite loops? | Hello World
First this question has a lot of ambiguities, what is considered the same page? same url or same content? The same content does not necessarily have the same url.
The web crawler uses BFS or DFS to scan through the graph (web), during regular BFS or BFS, we will need a visited set to keep track of the visited nodes, we do the same thing for our crawler.
Read full article from If you were designing a web crawler, how would you avoid getting into infinite loops? | Hello World
No comments:
Post a Comment