Zookeeper is a crucial piece of any Big Data deployment at an enterprise scale. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. All these services are used by distributed applications. According to the official website, Zookeeper got its name because coordinating distributed systems are a zoo.
At its core, Zookeeper is simple to understand. Think of it as a hierarchical filesystem or tree. The basic building block of Zookeeper is a znode. A znode can store data (like a file) or have child znodes (like a directory). The overall design of Zookeeper provides a highly available system consisting of znodes that make up a hierarchical namespace. The following is a representation of znodes:
Zookeeper node hierarchy
Zookeeper can be run as a single server in standalone mode or on a cluster of machines in replicated mode, called an ensemble. High availability in replicated mode is achieved by ensuring that modifications to the znodes tree are replicated to a majority of the ensemble. If a minority of machines in the ensemble fail, at least one live machine in the ensemble will have the latest state. Let’s consider an example. Suppose we have five machines (A, B, C, D, and E) running a Zookeeper ensemble. A majority of the machines, called a quorum, need an update. Machines A, C, and E get the update. Now, if a minority of the machines fail (two in this case) the service should continue to function correctly. If machines A and E fail, there’s still at least one machine, C, which has the latest state. The other two surviving machines, B and D, can catch up to the latest state. This seemingly simple algorithm to maintain high availability is notoriously hard to program correctly.
Zookeeper uses a protocol called Zab to implement this. The finer details of the protocol are outside the scope of this text. However, at a high level, the protocol operates in two phases. The first phase involves selecting a leader while others become followers and synchronize their state with the leader. In the second phase, all write requests are forwarded to the leader, which broadcasts the update to followers. Once the change is committed to the majority of the machines, the client requesting the change is apprised of a successful commit. All machines in the ensemble write updates to the disk before updating their in-memory copies of the znode tree. The client can direct read requests to any machine. The requests are fulfilled from the memory resulting in fast query responses.
If the leader fails, a new election is held. If the former leader comes back up, it joins as a follower. Zab is similar to another well-known leader election protocol called Paxos, but differs in several respects.
Manage and maintenacnes
Coordinating Kafka broker
Kafka Cluster master-less architecture
Follows the broker in the cluster
Takes care about broker lifecycle