All About Programming: Exactly-once Spark Streaming from Apache Kafka

Exactly-once Spark Streaming from Apache Kafka - Cloudera Engineering Blog

Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4.

The new release of Apache Spark, 1.3, includes new experimental RDD and DStream implementations for reading data from Apache Kafka. As the primary author of those features, I'd like to explain their implementation and usage. You may be interested if you would benefit from:

More uniform usage of Spark cluster resources when consuming from Kafka
Control of message delivery semantics
Delivery guarantees without reliance on a write-ahead log in HDFS
Access to message metadata

I'll assume you're familiar with the Spark Streaming docs and Kafka docs. All code examples are in Scala, but there are Java-friendly methods in the API.

Read full article from Exactly-once Spark Streaming from Apache Kafka - Cloudera Engineering Blog

Exactly-once Spark Streaming from Apache Kafka - Cloudera Engineering Blog

No comments:

Post a Comment

Labels

Popular Posts