All About Programming: Spark2.0，重要更新与改进

Spark2.0，重要更新与改进 - 简书

作为数据科学人员，如果一生只能学一个框架，那就先Spark！ In addition, this release includes over 2500 patches from over 300 contributors. 此版本超过2500个补丁，超过300位贡献者！ 02 环境支持 The default build is now using Scala 2.11 rather than Scala 2.10 编译Spark版本的环境从Scala 2.10变成了2.11。标志着以后写Scala程序，也最好使用2.11来编译了。不建议使用的版本，java7和Python2.6。另外，Spark对Python3的支持已经不错了，如果使用PySpark，建议直接使用Python3，要少些麻烦。 Spark 2.0 no longer requires a fat assembly jar for production deployment. 部署到生产环境中，不再需要那个臃肿的assembly文件了（貌似是对Scala开发的福利）。 03 Spark-Core Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface. 在Scala语言与Java语言中，统一了DataFrame与Dataset数据结构。Python和R中，因为语言本身缺少类型安全机制，因此DataFrame还是主要的编程接口。 SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.

Read full article from Spark2.0，重要更新与改进 - 简书

Spark2.0，重要更新与改进 - 简书

No comments:

Post a Comment

Labels

Popular Posts