Topics
A stream of messages belonging to a particular category is called a topic. Data is stored in topics.
Topics are split into partitions. For each topic, Kafka keeps a minimum of one partition. Each such partition
contains messages in an immutable ordered sequence. A partition is implemented as a set of segment files
of equal sizes.
● Partition
Topics may have many partitions, so it can handle an arbitrary amount of data.
● Partition offset
Each partitioned message has a unique sequence id called as offset.
● Replicas of partition
Replicas are nothing but backups of a partition. Replicas are never read or write data. They are used to
prevent data loss.
● Kafka Cluster
Kafka’s having more than one broker are called as Kafka cluster. A Kafka cluster can be expanded
without downtime. These clusters are used to manage the persistence and replication of message data.
● Producers
Producers are the publisher of messages to one or more Kafka topics. Producers send data to Kafka
brokers. Every time a producer publishes a message to a broker, the broker simply appends the message
to the last segment file. Actually, the message will be appended to a partition. Producer can also send
messages to a partition of their choice.
Apache Kafka - Fundamentals
● Consumers
Consumers read data from brokers. Consumers subscribes to one or more topics and consume published
messages by pulling data from the brokers.
● Leader
Leader is the node responsible for all reads and writes for the given partition. Every partition has one
server acting as a leader.
● Follower
Node which follows leader instructions are called as follower. If the leader fails, one of the follower will
automatically become the new leader. A follower acts as normal consumer, pulls messages and up-dates
its own data store.