Replication in Kafka
Broker replication config
Replication factor
Unclean leader election
Minimum in-sync replicas
Reliability is a guarantee that a system promises and the user of that system can rely on being upheld. Take, for instance, the case of relational databases that offer ACID guarantees - atomicity, consistency, isolation, and durability. An ACID-compliant database guarantees certain behaviors for transactions. Users can then write safe applications against the database working off of those guarantees. Users know how the database will behave in case of failures and can write reliable applications around those behaviors.
Kafka also makes certain reliability guarantees, including:
If the same producer first writes a message A and then message B to the same partition, consumers are guaranteed to see message A first and then message B. The offset of message B is guaranteed to be higher than that of message A.
A message sent to the broker is considered committed when the message has been written to the partition by the leader and by all the in-sync replicas. However, this doesn’t imply the message has also been flushed to disk, it may very well be in memory.
Committed messages will not be lost as long as one replica is alive.
Consumers only read committed messages.
Consumers read committed messages only
Kafka allows for various configuration parameters to tweak the tradeoff between reliably and consistently storing messages versus considerations like availability, high throughput, low latency, and hardware costs. Kafka provides reliability by replicating messages across brokers, which we’ll delve into next.