Kafka Real-time Stream Multi-topic Catch up Trick | Sematext Blog



Kafka Real-time Stream Multi-topic Catch up Trick | Sematext Blog

Half of the world, Sematext included, seems to be using Kafka. Kafka is the spinal cord that connects various components in SPM , Site Search Analytics , and Logsene .  If Kafka breaks, we're in trouble (but we have anomaly detection all over the place to catch issues early).  In many Kafka deployments, ours included, the most recent data is the most valuable.  Consider the case of Kafka in SPM, which processes massive amounts of performance metrics for monitoring applications and servers.  Clearly, in a performance monitoring system you primarily care about current performance numbers.  Thus, if SPM's Kafka pipeline were to break and we restore it, what we'd really like to avoid is processing all data sequentially, oldest to newest.  What we'd prefer is processing new metrics data first and then processing older data using any spare capacity we have in order to "fill the gap" caused by Kafka downtime. Here's a very quick "video" that show this in action:

Read full article from Kafka Real-time Stream Multi-topic Catch up Trick | Sematext Blog


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts