Data Integrity and Problems of Scope | Peter Bailis



Data Integrity and Problems of Scope | Peter Bailis

Mutable state in distributed systems can cause all sorts of headaches, including data loss, corruption, and unavailability. Fortunately, there are a range of techniques—including exploiting commutativity and immutability—that can help reduce the incidence of these events without requiring much overhead. However, these techniques are only useful when applied correctly. When applied incorrectly, applications are still subject to data loss and corruption. In my experience, (the unfortunately common) incorrect application of these techniques is often due to problems of scope. What do I mean by scope? Let's look at two examples:

1.) Commutativity Gone Wrong

For our purposes, commutativity informally means that the result of executing two operations is independent of the order in which the operations were executed. Consider a counter: if I increment by one and you also increment by one, the end result (counter += 2) is equivalent despite the order in which we execute our operations! We can use this fact to built more scalable counters that exploit this potential for reorderability.

However, we're not off the hook yet. While our operations on the individual counter were commutative, does this mean that our programs that use this counter are "correct"? If we're Twitter and we want to build retweet counters, then commutative counters are probably a fine choice. But what if the counter is storing a bank account balance?1 Or if it's monitoring the amount of available inventory for a given item in a warehouse? While two individual decrement operations are commutative, two account withdrawal operations (or two item order placement operations) are not. By looking at individual data items instead of how the data is used by our programs, we risk data corruption. When analyzing commutativity, myopia is dangerous.


Read full article from Data Integrity and Problems of Scope | Peter Bailis


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts