All About Programming: (8) How does Hadoop handle split input records?

(8) How does Hadoop handle split input records? - Quora

Hadoop will use your RecordReader and InputFormat to figure out how to 1. create splits and 2. parse data within each split into records (or K/V objects) that can be passed to the mapper. If an InputSplit (which you get to create in your input format) doesn't map exactly to an HDFS block, Hadoop's FileInputFormat (and people that extend it) will Do The Right Thing(tm) by performing a partial network read to complete the record using the first few bytes from the next block. If you want to see the gory details, the source to TextInputFormat (which in turn extends FileInputFormat) is where all the logic lives. You almost certainly want to extend FileInputFormat so you get this behavior for free.

Read full article from (8) How does Hadoop handle split input records? - Quora

(8) How does Hadoop handle split input records? - Quora

No comments:

Post a Comment

Labels

Popular Posts