Notes for Harvard CS75 Web Development Lecture 9 Scalability by David Malan | Nine notes



Notes for Harvard CS75 Web Development Lecture 9 Scalability by David Malan | Nine notes

Features:

  • Is IP address blocked in some countries/regions?
  • SFTP vs. FTP. SFTP is secure and all the traffic is encrypted, which is important for user names and passwords
  • Some hosting companies may offer you some unbelievable features, like unlimited storage spac, at a very low price. It's very likely that you and another hundreds users are sharing the same machine and contending for resources. This is because sometime people actually don't need that many resources.
  • Virtual private server. May still share one machine with other users, but you have your own copy of the operating system. Run multiple virtual machine on a physical machine. Only you and the system administrators have access to your files
  • If you want more privacy, then probably you have to operate your own servers

AWS EC2

How to scale

Vertical Scaling

Get more RAM, processors, disks,…, for one machine, but you will exhaust the financial resources/state of arts technology.

Horizontal Scaling

Plural number of machines, use multiple servers to build the topology.

Load Balancer

Need to distriubte inbound HTTP requests

Return the public IP address of the load balancer, and let the load balancer determin how to actually route data to the backend server (private address).

Implementation

  • Dedicated servers for gifs, jpegs, images, videos, etc for different host HTTP header
  • Round robin. Or, the load balancer can be a DNS setup which returns the IP address of server 1 when the first time someone asks for a url, then return the IP address of server 2 when the second time someone asks for the same url, then server 3, server 4,…, eventually wrapping up. Downside: one server may get a really computational heavy user;
  • Based on the load on a server
  • Have a server specifically for storing sessions. But what if that machine breaks down. Lacks redundacy. Can add RAID (redundant array of independent disks), striping and redundancy

When we type the url in the browser and hit enter, the OS will send a packet to DNS server which will translate host names to IPs and vice versa. If we click a link on a website, there is a cache to store the IP address so the OS doesn't have to send the same DNS request again. Both OS and browser have a cache. Time to Live (TTL) values associated with an answer from a DNS server, 5 minutes, 1 hour, or 1 day. Global load balancing…

If the backend is PHP based and the session in PHP is broken. And if you were on Server 1, then by chance you are sent to Server 2 by a round-robin, you might have to log in again. Or think about shopping cart.

Stick sessions (when you visit a website multiple times your session is somehow preserve even if there are multiple backend servers)

Cookies:
Can store the address of the server so the next the user visit the website, he goes to the same back-end server. Downside: the private IP of the back-end server may change; the private IP is now visible to the whole world

==>

Store a random number and let the load balancer remember which number belongs to which server

  • Software
    • ELB
    • HAProxy
    • LVS
  • Hardware
    • Barracuda
    • Cisco
    • Citrix
    • F5

PHP Acceleration
php.exe compiles php everytime but throws away the result. Some software can keep the result. Like .py vs. .pyc.

Caching

  • .html vs. MySQL database/XML (avoid regenerating) more performance vs. more space. But requires a lot of work when want to update/redesign the page
  • MySQL query cache: query_cache_type: 1
  • memchached: store whatever you want in RAM (garbage collection: expire objects based on when they are put in)

Replication:

Master-Slave

Master: the main database that you write/read data to/from.
Slave: anytime a query is executed on the
master that same query is copied down to one or more slaves and they do the exact same thing

Advantages:

  • If the master is down, promote one of the slaves and do some configuration. (redundacy)
  • If there are a lot queries, you could just load balance across database servers
  • For read heavy websites, any select can go to all four databases, while any insert/update/delete has to go to server master

Mastter-Master

you could write to either server one or two and if you happen to write to server1 that query gets replicated on server2 and vice versa so now you could keep it simple

Load balancing + Replication

active + active pair of load balancers
active + passive pair of load balancers, passive promote itself when receives no more packets from the active one.
and send packets to each other

Partitioning
A-M cluster and O-Z cluster

High Availability
One load balancer, two master replicating each other

Summary

1
2
3
4
5
6
7
8
Internet ==(TCP 443 80)==> 
two load balancers ==(TCP 80)==>
web servers ==x==>
two load balancers ==x (TCP 3306)==>
two <==X==> master databases


Fire wall on switch ports

Read full article from Notes for Harvard CS75 Web Development Lecture 9 Scalability by David Malan | Nine notes


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts