Sunday, March 23, 2025
Kafka Idempotent Producer
When an application publishes events to a Kafka topic there is a risk that duplicate events can be written in failure scenarios, and consequently message ordering can be lost. This can be avoided by configuring the Kafka Producer to be idempotent. This article describes how duplicate events can be published and how to make the Producer idempotent.
Duplicate Messages
Duplicate messages can occur in the scenario where:
- A Producer attempts to write a message to a topic partition.
- The broker does not acknowledge the write due to some transient failure scenario.
- The Producer retries as it does not know whether the write succeeded or not.
- If the Producer is not idempotent and the original write did succeed then the message would be duplicated.
By configuring the Producer to be idempotent, each Producer is assigned a unique Id (PID) and each message is given a monotonically increasing sequence number. The broker tracks the PID + sequence number combination for each partition, rejecting any duplicate write requests it receives.
Idempotent Producer Configuration
The Kafka Producer configuration enable.idempotence determines whether the producer may write duplicates of a retried message to the topic partition when a retryable error is thrown.
To ensure idempotent behavior, acks must be set to all. The leader waits until the minimum required number of in-sync replicas acknowledge the message before responding.
If retries = 0, the Producer will not retry and may dead-letter messages unnecessarily. This is not recommended.
Unlike implementing an idempotent consumer, enabling an idempotent producer requires no code changes—only configuration.
Producer & Consumer Timeouts
It is recommended to leave retries at the default (max integer) and instead limit retries by time using delivery.timeout.ms.
If the timeout exceeds the consumer poll timeout, the consumer may be removed from the group, causing duplicate downstream events.
Guaranteed Message Ordering
The max.in.flight.requests.per.connection setting increases throughput by allowing multiple unacknowledged requests.
- If the Producer is not idempotent and this value > 1 → ordering may break.
- If the Producer is idempotent → ordering is guaranteed up to a value of 5.
Recommended Configuration
Client Library Support
Kafka Java client defaults changed in 3.0.0 to enable.idempotence=true and acks=all.
KafkaJS marks idempotence as experimental. librdkafka added full support in v1.4.0.
Problem with Retries
Retrying a message can cause duplicates if the broker wrote the message but the acknowledgment was lost.
Kafka Idempotent Producer
When enable.idempotence=true, each producer gets a PID and each message gets a sequence number. The broker tracks the highest sequence number per PID and discards duplicates.
Java example:
Properties properties = new Properties(); properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
Overall, enabling idempotence is recommended for all Kafka producers.
Saturday, March 22, 2025
Difference-between-kafka-idempotent-and-transactional-producer-setup
Update for Apache Kafka 3.0 According to the Announcement of Apache Kafka 3.0 the producer enables the strongest delivery guarantees by default (acks=all, enable.idempotence=true). This means that users now get ordering and durability by default.
The Idempotent Producer only has guarantees within the life of the Producer process. If it crashes, the new Idempotent Producer will have a different ProducerId and will start its own sequence. The Sequence number simply starts from 0 and monotically increases for each record. If a record fails being delivered, it is sent again with its existing sequence number so it can be deduplicated (if needed) by the brokers. The sequence number is per producer and per partition. Currently Kafka does not offer a way to "continue" an Idempotent Producer session. Each time you start one it gets a new and unique ProducerId (generated by the cluster)
The configuration "idempotent" only works when the producer does not crash. However with the transactions, you cand send data accross different partitions exactly once. You set a transaction id with your producer id (automatically created). If a new producer id arrives with the same transaction id, it means that you have a problem. Then, the records will be written exactly once.
This is a feature that is sorely missing from Kafka, and I don't see an elegant and efficient way to solve it without modifying Kafka itself. As a preliminary, if you want true idempotency across any failure (producer or broker), then you absolutely positively need some kind of id in the business layer (rather than the lower level transport layer). What you could do with such an id in Kafka is this: Your producer writes to a topic at-least once, and then you have a Kafka Streams process deduplicating messages from that topic using your business layer id and publishing the remaining unique messages to another topic. In order to be efficient, you should use a monotonically increasing id, aka sequence number, otherwise you would have to keep around (and persist) every id you have ever seen, which amounts to a memory leak, unless you restrict the ability to deduplicate to the last x days / hours / minutes and retain only the latest ids. Or, you give Apache Pulsar a try, which, besides addressing other sore spots of Kafka (having to do a costly manual and error prone rebalance in order to scale out a topic, to name just one) has this feature built in.
Kafka Partition
🧩 What Exactly Is a Partition in Kafka? A partition is the fundamental unit of storage, parallelism, and scalability in Kafka. Think of ...
-
Kafka Idempotence, Transactions, Offsets, and Exactly-Once Semantics Kafka Idempotence, Transactions, Offsets, and Exactly-On...
-
Idempotent Kafka Producer When an application publishes events to a Kafka topic there is a risk that duplicate events ca...
-
Kafka Consumer Fetch Settings - Interview Prep Kafka Consumer Fetch Tuning - Interview Guide Purpose of Kafka Consumer Gro...
