Next, we’ll discuss an example of when it may make sense to use the onPartitionsAssigned method. Once a rebalance happens, the newly assigned consumer starts reading records from the stored offset. However, what if we want to skip ahead a few records or start reading from a few records back? Kafka allows us to start reading from either the beginning or the very end of a partition using the methods seekToBeginning(TopicPartition tp) and seekToEnd(TopicPartition tp) respectively. We can also position a consumer at a specific offset within a partition and have it read records starting from that offset. This may be required for applications working off of time-series data which may want to skip ahead a few records if they have become stale. More generally, if a consumer has lagged behind and the application can tolerate skipping messages, then it can jump over several of them and start reading more recent messages. The seek(TopicPartition tp, long offset) method available on the consumer object places the consumer at the passed-in offset for them to start reading from. Within the onPartitionsAssigned method, we can position the consumer at an offset of our choosing just before it starts processing records from that partition.
Consider another scenario, say as you read records from the partition, you also write them into a database. The pseudocode within the poll loop will resemble something like the following:
xxxxxxxxxx
1. The poll loop {
.
.
.
2. poll for records
3. while (process all records) {
4. commit record to external DB
5. commit offset
}
.
.
.
}