Setting Up a Sample Application in HBase, Spark, and HDFS - DZone Big Data



Setting Up a Sample Application in HBase, Spark, and HDFS - DZone Big Data

Apache Spark is a framework where the hype is largely justified. It is both innovative as a model for computation and well designed as a product. People may be tempted to compare it with another framework for distributed computing that has become popular recently, Apache Storm for example, with statements like "Spark is for batch processing while Storm is for streaming." But those are entirely different beasts. Storm is a dataflow framework, very similar to the HyperGraph DataFlow framework, and there are others like it. It's based on a model that has existed at least since the 1970s even though its author seems to be claiming credit for it. Spark on the other hand is a novel approach to deal with large quantities of data with complex, arbitrary computations on it. Note the "arbitrary" — unlike Map/Reduce, Spark will let you do anything with the data. I hope to post more about what's fundamentally different between something like Storm and Spark because it is interesting, theoretically. But I highly recommend reading the original paper describing RDD (Resilient Distributed Dataset), which is the abstraction at the foundation of Spark.


Read full article from Setting Up a Sample Application in HBase, Spark, and HDFS - DZone Big Data


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts