Kafka topics are used to store and organize events in a Kafka cluster. A topic is a stream of data within the cluster, and a Kafka cluster can have many topics. Each topic is identified by its name, which must be unique across the entire Kafka cluster. For example, a topic could be named “payments,” “orders,” or “logs.
A producer sends a message to a Kafka topic, and a consumer reads data from the topic. Topics in Kafka are multi-producer and multi-subscriber, meaning that a topic can have one or more producers that write data to it and one or more consumers that read from it.
A topic is similar to a table in a database, but without any constraints. Producers can send data to a topic in any format, and Kafka topics support any kind of message format, such as JSON, text files, binary files, or anything else. The sequence of messages in a topic is called a data stream, and messages in a topic can be read as often as needed. By default, Kafka retains all published messages for a limited amount of time, whether or not they have been consumed. The default retention time is 168 hours (i.e., one week), but this can be configured in the configuration file (using the log.retention.hours setting).
In a Kafka cluster, topics can be divided into multiple partitions. A topic can have one or more partitions, and these partitions are located on different Kafka brokers. The placement of data is important for scalability, as it allows clients to read and write data from many brokers at the same time. The number of partitions in a topic is defined at the time of topic creation, and they are numbered 0, 1, 2, and so on.