[KAFKA-3758] KStream job fails to recover after Kafka broker stopped - ASF JIRA
We've been doing some testing of a fairly complex KStreams job and under load it seems the job fails to rebalance + recover if we shut down one of the kafka brokers. The test we were running had a 3-node kafka cluster where each topic had at least a replication factor of 2, and we terminated one of the nodes.
Attached is the full log, the root exception seems to be contention on the lock on the state directory. The job continues to try to recover but throws errors relating to locks over and over. Restarting the job itself resolves the problem.
Read full article from [KAFKA-3758] KStream job fails to recover after Kafka broker stopped - ASF JIRA
No comments:
Post a Comment