Spark the fastest open source engine for sorting a petabyte – Databricks
October 10, 2014 | by Reynold Xin Update November 5, 2014: Our benchmark entry has been reviewed by the benchmark committee and Spark has won the Daytona GraySort contest for 2014! Please see this new blog post for update . Apache Spark has seen phenomenal adoption, being widely slated as the successor to Hadoop MapReduce, and being deployed in clusters from a handful to thousands of nodes. While it was clear to everybody that Spark is more efficient than MapReduce for data that fits in memory, we heard that some organizations were having trouble pushing it to large scale datasets that could not fit in memory. Therefore, since the inception of Databricks, we have devoted much effort, together with the Spark community, to improve the stability, scalability, and performance of Spark. Spark works well for gigabytes or terabytes of data, and it should also work well for petabytes. To evaluate these improvements, we decided to participate in the Sort Benchmark .Read full article from Spark the fastest open source engine for sorting a petabyte – Databricks
No comments:
Post a Comment