[Design] How Google search works - Shuatiblog.com



[Design] How Google search works - Shuatiblog.com

Launched on Sep 15th, 1997

60 trillion individual pages. 100 million GB data.

40,000 search per second, or 3 billion search per day.

As of Feb 2015, 65% market share in US.

1. Crawl

Google crawl from 1 page to another using Googlebot. It starts from previous urls crawled, or augmented with Sitemap data

A sitemap is a list of pages of a web site accessible to crawlers or users. It can be either a document in any form used as a planning tool for Web design, or a Web page that lists the pages on a Web site, typically organized in hierarchical fashion.

2. Indexing

Compile the data (key content tags, atrributes, like title tags, ALT attributes). Google don't process rich media or dynamic files.

3. Algorithm

When search, pull all relevant results from the Index.

Rank the result based on 200+ factors, one of which is the PageRank for a given page.

PageRank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on your site from another site adds to your site's PageRank.

Not all links are equal: Google works hard to identify spam links. The best types of links are those that are given based on the quality of your content.


Read full article from [Design] How Google search works - Shuatiblog.com


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts