La Trobe University Bundoora, Victoria 3086 Email: z.he@latrobe.edu.au Authors of examples: Matthias Langer and Zhen He Emails addresses: m.langer@latrobe.edu.au, z.he@latrobe.edu.au These examples have only been tested for Spark version 1.1. We assume the functionality of Spark is stable and therefore the examples should be valid for later releases. The RDD API By Example RDD is short for Resilient Distributed Dataset. RDDs are the workhorse of the Spark system. As a user, one can consider a RDD as a handle for a collection of individual data partitions, which are the result of some computation. However, an RDD is actually more than that. On cluster installations, separate data partitions can be on separate nodes. Using the RDD as a handle one can access all partitions and perform computations and transformations using the contained data. Whenever a part of a RDD or an entire RDD is lost, the system is able to reconstruct the data of lost partitions by using lineage information.
Read full article from Apache Spark RDD API Examples
No comments:
Post a Comment