The leader is also prone to failures and can go down for a number of reasons. A leader failure is more complex than a follower failure and has a lot of nuances. When the leader fails, the following actions must take place:
Detecting that the leader has failed. If the failover isn’t manual then usually a timeout value is used to determine if the leader is down or not. If the leader doesn’t send out or respond to a heartbeat message it is assumed to be dead.
Promoting one of the followers as the new leader. The follower with the most up to date changes from the old leader is usually the preferred choice for the new leader as it results in minimal loss of data. Leaders can be elected in a variety of ways and getting all the followers to agree on a leader is a consensus problem. The new leader can be chosen by an *election* process where a majority decides on the new leader or the new leader can be appointed by a previously elected controller node.
Configuring clients to direct reads to the new leader.
Configuring followers to receive changes from the new leader.
Ensuring that in case the old leader comes back up, it assumes the role of a follower and doesn’t consider itself the leader anymore.
All the above steps are collectively referred to as the leader *failover*. There are a number of issues that can happen during a failover and some of them include the following: