The retries parameter sets the number of times a producer retries sending a message before giving up and declaring a failure to the client. There are two kinds of failures a producer can encounter.
The first are failures that can’t be retried, such as “message too large” errors.
The second are the failures which can be retried (e.g. write failure) because of the absence of a partition leader. These failures are automatically retried by the producer and the application logic should not handle them. Rather, the application should only handle the case when retries for transient failures have been exhausted.
Another setting retry.backoff.ms is of interest here, which denotes the milliseconds the producer waits before re-attempting a failed send. Generally, the number of retries and the wait between the retries should be greater than the time it takes for a broker to recover from a crash, otherwise the producer will declare a failure too soon.