Consider a scenario where we have three replicas and two of them go down, leaving only the leader as the in-sync replica. The leader keeps receiving messages from the producers and writing them. During this time, though, we risk data loss and inconsistency if the followers don’t come back up and catch up to the leader. This is because if any of the two followers become the leader after the original leader went down, then the new leader will not have the latest messages that the previous leader received. To complicate matters further, it is possible that some of the consumers may have read some of the newer messages from the old leader that aren’t available with the new leader.
Let’s take a concrete example. Say two brokers are down and have replicated messages starting from offset 0 to 10. The leader, which is the only in-sync replica, receives and commits messages with offsets 11 through 20. Suddenly, the leader goes down while a prior network glitch heals and the previously down followers come online again. Now, if one of these followers becomes the new leader, it will serve messages from offset 0 to 10. Later, when the original leader also comes back online, it’ll erase the messages from offset 11 to 20 that it originally had to catch-up to the new leader. We can experience these situations when:
The Kafka cluster has a single in-sync replica.
Network partition causes all followers to become out of sync with the leader.
The obvious mitigation is to not let a replica that is out of sync become the new leader. However, this solution comes at the cost of availability. The configuration parameter unclean.leader.election.enable allows Kafka to elect out of sync brokers as the leader. This may be a suitable choice for topics that track user behavior on a website such as clicks or likes. However, for topics such as credit card payments, that absolutely can’t tolerate data loss or data inconsistency, the parameter unclean.leader.election.enable is set to false. When set to false, we choose to wait for the original leader to come back online, resulting in lower availability.