Spark & Kafka - Achieving zero data-loss
Spark Streaming can connect to Kafka using two approaches described in the Kafka Integration Guide. The first approach, which uses a receiver, is less than ideal in terms of parallelism, forcing you to create multiple DStreams to increase the throughput. As a matter of fact, most people tend to depreciate it in favor of the Direct Stream approach that appeared in Spark 1.3 (see the blog post on Databricks' blog and a blog post of the main contributor).
Read full article from Spark & Kafka - Achieving zero data-loss
No comments:
Post a Comment