Debunking BigData Myths write durability, data integrity and consistency - MyHowTo.org



Debunking BigData Myths write durability, data integrity and consistency - MyHowTo.org

Several times in the past I have heard a very strange thing about BigData storage systems - specifically Cassandra and Hadoop. People were praising their low cost (relative), scalability and open-source nature. Yet, the same people did say something like "for that price we are ok if some data loss is possible from time to time". Shocking? Or, more importantly, is this really something that BigData adopters have to tolerate in exchange for other benefits?

Funny, one of the recent dialogs involved Oracle a "more reliable" alternative.

I mean no disrespect - in many cases it was a clear misunderstanding of the difference between the data consistency and durability or the writes. And, generally, the ability of the storage system to preserve the data integrity over time.

First, about the software quality. To be fair, it is quite possible that Oracle has spent more time and money on testing Oracle database server and weeding out the bugs. Oracle software is used by millions of customers. Open-source software like Apache Cassandra is also used by many thousands of customers and many of them are also not very tolerant to software bugs. Not to mention that many open-source products are supported by commercial vendors who perform additional quality control. DataStax does it for Apache Cassandra, Cloudera, Hortonworks and others - for Hadoop and so on. Also it is important to mention that the source code for open-source products is publicly available and thousands of people contribute to it. Bottom line - I am not really buying the argument that the software origin (on average) makes a huge difference for data integrity and durability when comparing commercial and open-source products. Assuming that the latter are mature enough, of course.


Read full article from Debunking BigData Myths write durability, data integrity and consistency - MyHowTo.org


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts