My Research Diaries: Running Apache Spark Unit Tests Sequentially with Scala Specs2
Apache Spark has a growing community in the Machine Learning and Analytics world. One of the thing that often comes up when developing with Spark is the Unit tests for functions that take in an RDD and return an RDD. There is the famous Quantified Blog on Spark Testing with FunSuite which gives a great way to design the trait class and then use it in our test classes. But it was a little outdated (written for Spark 0.6). In other words, the system.clearproperty("spark.master.port") is no longer a property that exists in Spark 1.0.1. Thankfully the Spark Summit 2014 talk on "Spark Testing: Best Practices" is based on the latest version of Spark and has the right properties to set, namely spark.driver.port and spark.hostPort. We also used org.Specifications2 (scala Specifications) and Mockito libraries for testing, so our trait class looks a little different.Read full article from My Research Diaries: Running Apache Spark Unit Tests Sequentially with Scala Specs2
No comments:
Post a Comment