All About Programming: Understanding Spark Caching

Understanding Spark Caching | Sujee Maniyam

For small data sets (few hundred megs) we can use raw caching. Even though this will consume more memory, the small size won't put too much pressure on Java garbage collection.

Raw caching is also good for iterative work loads (say we are doing a bunch of iterations over data). Because the processing is very fast

For medium / large data sets (10s of Gigs or 100s of Gigs) serialized caching would be helpful. Because this will not consume too much memory. And garbage collecting gigs of memory can be taxing

Read full article from Understanding Spark Caching | Sujee Maniyam

Understanding Spark Caching | Sujee Maniyam

No comments:

Post a Comment

Labels

Popular Posts