Determining the suitable timeout value before declaring a leader dead is crucial. If the timeout is too long it can potentially make the system unavailable for that period and also prolong the recovery. Similarly, if the timeout is too short, it’ll result in unnecessary failovers. For instance if there are spikes in write requests or the network slows down for some reason, the leader may respond to health messages with delay triggering an unneeded failover.
There is no silver bullet for any of the problems/issues we discussed above, rather there are tradeoffs to be made when tuning for the replica consistency, durability, availability, and latency characteristics of a distributed system.