Understanding Spark Caching | Sujee Maniyam
For small data sets (few hundred megs) we can use raw caching. Even though this will consume more memory, the small size won't put too much pressure on Java garbage collection. Raw caching is also good for iterative work loads (say we are doing a bunch of iterations over data). Because the processing is very fast For medium / large data sets (10s of Gigs or 100s of Gigs) serialized caching would be helpful. Because this will not consume too much memory. And garbage collecting gigs of memory can be taxingRead full article from Understanding Spark Caching | Sujee Maniyam
No comments:
Post a Comment