作为数据科学人员,如果一生只能学一个框架,那就先Spark! In addition, this release includes over 2500 patches from over 300 contributors. 此版本超过2500个补丁,超过300位贡献者! 02 环境支持 The default build is now using Scala 2.11 rather than Scala 2.10 编译Spark版本的环境从Scala 2.10变成了2.11。标志着以后写Scala程序,也最好使用2.11来编译了。 不建议使用的版本,java7和Python2.6。 另外,Spark对Python3的支持已经不错了,如果使用PySpark,建议直接使用Python3,要少些麻烦。 Spark 2.0 no longer requires a fat assembly jar for production deployment. 部署到生产环境中,不再需要那个臃肿的assembly文件了(貌似是对Scala开发的福利)。 03 Spark-Core Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface. 在Scala语言与Java语言中,统一了DataFrame与Dataset数据结构。Python和R中,因为语言本身缺少类型安全机制,因此DataFrame还是主要的编程接口。 SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.
Read full article from Spark2.0,重要更新与改进 - 简书
No comments:
Post a Comment