apache - How to read multiple text files into a single RDD? - Stack Overflow
You can specify whole directories, use wildcards and even CSV of directories and wildcards. E.g.:
sc.textFile("/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file")
As Nick Chammas points out this is an exposure of Hadoop's FileInputFormat
and therefore this also works with Hadoop (and Scalding).
Read full article from apache - How to read multiple text files into a single RDD? - Stack Overflow
No comments:
Post a Comment