KIP-95: Incremental Batch Processing for Kafka Streams - Apache Kafka - Apache Software Foundation
Kafka has the vision to unify stream and batch processing with the log as central data structure (ground truth). With this KIP, we want to enlarge the scope Kafka Streams covers, with the most basic batch processing pattern: incremental processing.
By incremental processing, we refer to the case that data is collected for some time frame, and an application is being started periodically to process all the newly collected data so far, similar to a "batch job" in Hadoop. For example, some data pipeline creates a new file of collected data each hour. Thus, whenever a new file is available, a new batch job is started to process the file. To carry over this scenario to Kafka, producers would continuously write data into a topic, and the user want to schedule a recurring "batch" job, that processes everything written "so far" (here, "EOL" stands for "end-of-log"):
Read full article from KIP-95: Incremental Batch Processing for Kafka Streams - Apache Kafka - Apache Software Foundation
No comments:
Post a Comment