Sharding & IDs at Instagram – Instagram Engineering
With more than 25 photos and 90 likes every second, we store a lot of data here at Instagram. To make sure all of our important data fits into memory and is available quickly for our users, we've begun to shard our data — in other words, place the data in many smaller buckets, each holding a part of the data.
Our application servers run Django with PostgreSQL as our back-end database. Our first question after deciding to shard out our data was whether PostgreSQL should remain our primary data-store, or whether we should switch to something else. We evaluated a few different NoSQL solutions, but ultimately decided that the solution that best suited our needs would be to shard our data across a set of PostgreSQL servers.
Before writing data into this set of servers, however, we had to solve the issue of how to assign unique identifiers to each piece of data in the database (for example, each photo posted in our system). The typical solution that works for a single database — just using a database's natural auto-incrementing primary key feature — no longer works when data is being inserted into many databases at the same time. The rest of this blog post addresses how we tackled this issue.
Read full article from Sharding & IDs at Instagram – Instagram Engineering
No comments:
Post a Comment