All About Programming: Hadoop Wins TeraSort

Hadoop Wins TeraSort | Perspectives

Jim Gray proposed the original sort benchmark back in his famous Anon et al paper A Measure of Transaction Processing Power originally published in Datamation April 1, 1985. TeraSort is one of the benchmarks that Jim evolved from this original proposal.

TeraSort is essentially a sequential I/O benchmark and the best way to get lots of I/O capacity is to have many servers. The mainframe engineer-a-bigger-bus technique has produced some nice I/O rates but it doesn't scale. There have been some very good designs but, in the end, commodity parts in volume always win. The trick is coming up with a programming model that is understandable to allow thousands of nodes to be harnessed. MapReduce takes some heat for not being innovative and not having learned enough from the database community (MapReduce – A Major Step Backwards). However, Google, Microsoft, and Yahoo run the model over thousands of nodes. And all three have written higher level languages layers above MapReduce some of which look very SQL-like.

Owen O'Malley of the Yahoo Grid team took a moderate sized Hadoop cluster of 910 nodes and won the TeraSort benchmark. Owen blogged the result: Apache Hadoop Wins Terabyte Sort Benchmark and provided more details in a short paper: TeraByte Sort on Apache Hadoop. Great result Owen.

Read full article from Hadoop Wins TeraSort | Perspectives

Hadoop Wins TeraSort | Perspectives

No comments:

Post a Comment

Labels

Popular Posts