Setting Up a Sample Application in HBase, Spark, and HDFS - DZone Big Data
Apache Spark is a framework where the hype is largely justified. It is both innovative as a model for computation and well designed as a product. People may be tempted to compare it with another framework for distributed computing that has become popular recently, Apache Storm for example, with statements like "Spark is for batch processing while Storm is for streaming." But those are entirely different beasts. Storm is a dataflow framework, very similar to the HyperGraph DataFlow framework, and there are others like it. It's based on a model that has existed at least since the 1970s even though its author seems to be claiming credit for it. Spark on the other hand is a novel approach to deal with large quantities of data with complex, arbitrary computations on it. Note the "arbitrary" — unlike Map/Reduce, Spark will let you do anything with the data. I hope to post more about what's fundamentally different between something like Storm and Spark because it is interesting, theoretically. But I highly recommend reading the original paper describing RDD (Resilient Distributed Dataset), which is the abstraction at the foundation of Spark.
Read full article from Setting Up a Sample Application in HBase, Spark, and HDFS - DZone Big Data
No comments:
Post a Comment