Kafka Real-time Stream Multi-topic Catch up Trick | Sematext Blog
Half of the world, Sematext included, seems to be using Kafka. Kafka is the spinal cord that connects various components in SPM , Site Search Analytics , and Logsene . If Kafka breaks, we're in trouble (but we have anomaly detection all over the place to catch issues early). In many Kafka deployments, ours included, the most recent data is the most valuable. Consider the case of Kafka in SPM, which processes massive amounts of performance metrics for monitoring applications and servers. Clearly, in a performance monitoring system you primarily care about current performance numbers. Thus, if SPM's Kafka pipeline were to break and we restore it, what we'd really like to avoid is processing all data sequentially, oldest to newest. What we'd prefer is processing new metrics data first and then processing older data using any spare capacity we have in order to "fill the gap" caused by Kafka downtime. Here's a very quick "video" that show this in action:Read full article from Kafka Real-time Stream Multi-topic Catch up Trick | Sematext Blog
No comments:
Post a Comment