KIP-95: Incremental Batch Processing for Kafka Streams - Apache Kafka - Apache Software Foundation



KIP-95: Incremental Batch Processing for Kafka Streams - Apache Kafka - Apache Software Foundation

Kafka has the vision to unify stream and batch processing with the log as central data structure (ground truth). With this KIP, we want to enlarge the scope Kafka Streams covers, with the most basic batch processing pattern: incremental processing.


By incremental processing, we refer to the case that data is collected for some time frame, and an application is being started periodically to process all the newly collected data so far, similar to a "batch job" in Hadoop. For example, some data pipeline creates a new file of collected data each hour. Thus, whenever a new file is available, a new batch job is started to process the file. To carry over this scenario to Kafka, producers would continuously write data into a topic, and the user want to schedule a recurring "batch" job, that processes everything written "so far" (here, "EOL" stands for "end-of-log"):


Read full article from KIP-95: Incremental Batch Processing for Kafka Streams - Apache Kafka - Apache Software Foundation


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts