Offsets can be automatically committed by the consumer using the configuration enable.auto.commit=true. You can use the configuration auto.commit.interval.ms to control how frequently the commits happen. By default, the frequency is set to 5 seconds. Automatic commits happen when the method poll() is invoked. The method checks if auto.commit.interval.ms seconds have elapsed since the last commit. If so, it commits the offset from the last poll() invocation (not the current one). Below is a pictorial representation of how the automatic commit works. We assume that each poll() returns four records and the auto commit interval defaults to 5 seconds.
Automatic commits with auto commit interval set to 5 seconds
In the diagram above, the first commit takes place at 9 seconds. Then a second commit takes place at 16 seconds when the third call for poll() is made. The second poll() invocation doesn’t result in an offset commit as auto.commit.interval.ms milliseconds have not elapsed.
The problem with this approach arises is if a rebalance occurs after a few records have been processed but before the next 5 second commit mark. In such a scenario, the new consumer will start reading from the last committed offset and read some of the records that were already processed by the previous consumer. In the above diagram, this could result from the situation when the consumer crashes after making the first poll() call and processing two out of the four returned records. We can reduce the window of time between commits to lower the number of duplicates encountered in the event of a rebalance, but this possibility can’t be completely eliminated.
Offsets can be automatically committed by the consumer using the configuration enable.auto.commit=true. You can use the configuration auto.commit.interval.ms to control how frequently the commits happen. By default, the frequency is set to 5 seconds. Automatic commits happen when the method poll() is invoked. The method checks if auto.commit.interval.ms seconds have elapsed since the last commit. If so, it commits the offset from the last poll() invocation (not the current one). Below is a pictorial representation of how the automatic commit works. We assume that each poll() returns four records and the auto commit interval defaults to 5 seconds.
Automatic commits with auto commit interval set to 5 seconds
In the diagram above, the first commit takes place at 9 seconds. Then a second commit takes place at 16 seconds when the third call for poll() is made. The second poll() invocation doesn’t result in an offset commit as auto.commit.interval.ms milliseconds have not elapsed.
The problem with this approach arises is if a rebalance occurs after a few records have been processed but before the next 5 second commit mark. In such a scenario, the new consumer will start reading from the last committed offset and read some of the records that were already processed by the previous consumer. In the above diagram, this could result from the situation when the consumer crashes after making the first poll() call and processing two out of the four returned records. We can reduce the window of time between commits to lower the number of duplicates encountered in the event of a rebalance, but this possibility can’t be completely eliminated.