When an application publishes events to a Kafka topic there is a risk that duplicate events can be written in failure scenarios, and consequently message ordering can be lost. This can be avoided by configuring the Kafka Producer to be idempotent. This article describes how duplicate events can be published and how to make the Producer idempotent.
Duplicate Messages
Duplicate messages can occur in the scenario where:
- A Producer attempts to write a message to a topic partition.
- The broker does not acknowledge the write due to some transient failure scenario.
- The Producer retries as it does not know whether the write succeeded or not.
- If the Producer is not idempotent and the original write did succeed then the message would be duplicated.
By configuring the Producer to be idempotent, each Producer is assigned a unique Id (PID) and each message is given a monotonically increasing sequence number. The broker tracks the PID + sequence number combination for each partition, rejecting any duplicate write requests it receives.
Idempotent Producer Configuration
The Kafka Producer configuration enable.idempotence determines whether the producer may write duplicates of a retried message to the topic partition when a retryable error is thrown.
To ensure idempotent behavior, acks must be set to all. The leader waits until the minimum required number of in-sync replicas acknowledge the message before responding.
If retries = 0, the Producer will not retry and may dead-letter messages unnecessarily. This is not recommended.
Unlike implementing an idempotent consumer, enabling an idempotent producer requires no code changes—only configuration.
Producer & Consumer Timeouts
It is recommended to leave retries at the default (max integer) and instead limit retries by time using delivery.timeout.ms.
If the timeout exceeds the consumer poll timeout, the consumer may be removed from the group, causing duplicate downstream events.
Guaranteed Message Ordering
The max.in.flight.requests.per.connection setting increases throughput by allowing multiple unacknowledged requests.
- If the Producer is not idempotent and this value > 1 → ordering may break.
- If the Producer is idempotent → ordering is guaranteed up to a value of 5.
Recommended Configuration
Client Library Support
Kafka Java client defaults changed in 3.0.0 to enable.idempotence=true and acks=all.
KafkaJS marks idempotence as experimental. librdkafka added full support in v1.4.0.
Problem with Retries
Retrying a message can cause duplicates if the broker wrote the message but the acknowledgment was lost.
Kafka Idempotent Producer
When enable.idempotence=true, each producer gets a PID and each message gets a sequence number. The broker tracks the highest sequence number per PID and discards duplicates.
Java example:
Properties properties = new Properties(); properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
Overall, enabling idempotence is recommended for all Kafka producers.
No comments:
Post a Comment