Big Data Analytics: Spark | Haifeng's Random Walk
In the previous post, we discussed MapReduce. Although it is great for large scale data processing, it is not friendly for iterative algorithms or interactive analytics because the data have to be repeatedly loaded for each iteration or be materialized and replicated on the distributed file system between successive jobs. Apache Spark is designed to solve this problem in means of in-memory computing. The overall framework and parallel computing model of Spark is similar to MapReduce but with an important innovation, reliant distributed dataset (RDD).
Read full article from Big Data Analytics: Spark | Haifeng's Random Walk
No comments:
Post a Comment