Apache Spark sample and coalesce | ThisShouldNeverHappenException
It is wasteful to work with large number of small partitions of data, especially when S3 is used as data storage. IO then becomes unacceptably large part of time spent in task, most annoyingly when Spark is just moving data files from _temporary location into final destination, after real work has been completed.
Read full article from Apache Spark sample and coalesce | ThisShouldNeverHappenException
No comments:
Post a Comment