Exactly-once Spark Streaming from Apache Kafka - Cloudera Engineering Blog
Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4.
The new release of Apache Spark, 1.3, includes new experimental RDD and DStream implementations for reading data from Apache Kafka. As the primary author of those features, I'd like to explain their implementation and usage. You may be interested if you would benefit from:
- More uniform usage of Spark cluster resources when consuming from Kafka
- Control of message delivery semantics
- Delivery guarantees without reliance on a write-ahead log in HDFS
- Access to message metadata
I'll assume you're familiar with the Spark Streaming docs and Kafka docs. All code examples are in Scala, but there are Java-friendly methods in the API.
Read full article from Exactly-once Spark Streaming from Apache Kafka - Cloudera Engineering Blog
No comments:
Post a Comment