(8) How does Hadoop handle split input records? - Quora
Hadoop will use your RecordReader and InputFormat to figure out how to 1. create splits and 2. parse data within each split into records (or K/V objects) that can be passed to the mapper. If an InputSplit (which you get to create in your input format) doesn't map exactly to an HDFS block, Hadoop's FileInputFormat (and people that extend it) will Do The Right Thing(tm) by performing a partial network read to complete the record using the first few bytes from the next block. If you want to see the gory details, the source to TextInputFormat (which in turn extends FileInputFormat) is where all the logic lives. You almost certainly want to extend FileInputFormat so you get this behavior for free.Read full article from (8) How does Hadoop handle split input records? - Quora
No comments:
Post a Comment