Spark2.0,重要更新与改进 - 简书



Spark2.0,重要更新与改进 - 简书

作为数据科学人员,如果一生只能学一个框架,那就先Spark! In addition, this release includes over 2500 patches from over 300 contributors. 此版本超过2500个补丁,超过300位贡献者! 02 环境支持 The default build is now using Scala 2.11 rather than Scala 2.10 编译Spark版本的环境从Scala 2.10变成了2.11。标志着以后写Scala程序,也最好使用2.11来编译了。 不建议使用的版本,java7和Python2.6。 另外,Spark对Python3的支持已经不错了,如果使用PySpark,建议直接使用Python3,要少些麻烦。 Spark 2.0 no longer requires a fat assembly jar for production deployment. 部署到生产环境中,不再需要那个臃肿的assembly文件了(貌似是对Scala开发的福利)。 03 Spark-Core Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface. 在Scala语言与Java语言中,统一了DataFrame与Dataset数据结构。Python和R中,因为语言本身缺少类型安全机制,因此DataFrame还是主要的编程接口。 SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.

Read full article from Spark2.0,重要更新与改进 - 简书


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts