My Research Diaries: Running Apache Spark Unit Tests Sequentially with Scala Specs2



My Research Diaries: Running Apache Spark Unit Tests Sequentially with Scala Specs2

Apache Spark has a growing community in the Machine Learning and Analytics world. One of the thing that often comes up when developing with Spark is the Unit tests for functions that take in an RDD and return an RDD. There is the famous Quantified Blog on Spark Testing with FunSuite which gives a great way to design the trait class and then use it in our test classes. But it was a little outdated (written for Spark 0.6). In other words, the system.clearproperty("spark.master.port") is no longer a property that exists in Spark 1.0.1. Thankfully the Spark Summit 2014 talk on "Spark Testing: Best Practices" is based on the latest version of Spark and has the right properties to set, namely spark.driver.port and spark.hostPort. We also used org.Specifications2 (scala Specifications) and Mockito libraries for testing, so our trait class looks a little different.

Read full article from My Research Diaries: Running Apache Spark Unit Tests Sequentially with Scala Specs2


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts