Saturday, January 3, 2026

Kafka Partition

๐Ÿงฉ What Exactly Is a Partition in Kafka?

A partition is the fundamental unit of storage, parallelism, and scalability in Kafka.

Think of a Kafka topic as a folder, and partitions as the files inside that folder.
Each partition is:

  • An ordered, append‑only log
  • Stored on a single broker (leader replica)
  • Replicated to other brokers (follower replicas)
  • The unit of parallelism for producers and consumers
  • The unit of fault tolerance (via replication)

๐Ÿ“Œ A Partition Is an Ordered Log

Inside a partition, messages are stored in strict order:

offset 0 → offset 1 → offset 2 → offset 3 → ...

Kafka guarantees ordering only within a partition, not across partitions.

This is why key‑based partitioning matters:
all messages with the same key go to the same partition → ordering preserved.


๐Ÿ“ฆ A Topic Is Split Into Multiple Partitions

Example:

topic: payments
partitions: 3

You get:

  • payments‑0
  • payments‑1
  • payments‑2

Each is an independent log.


๐Ÿ” Replication Happens Per Partition

If replication factor = 3, then each partition has 3 copies:

payments-0 → leader on broker 1, followers on broker 2 and 3
payments-1 → leader on broker 2, followers on broker 3 and 1
payments-2 → leader on broker 3, followers on broker 1 and 2

This is why we say:

“A replication factor of 3 means each partition has 3 replicas.”

Not the topic — the partition.


⚙️ Why Partitions Matter

1. Scalability

More partitions → more parallelism.

  • Producers can write in parallel
  • Consumers in a group can read in parallel
  • Throughput increases linearly with partitions

2. Fault Tolerance

Replication is done per partition, so Kafka can survive broker failures.

3. Ordering Guarantees

Kafka only guarantees ordering within a partition, not across the topic.


๐Ÿง  A Simple Visual

Topic: orders
Partition 0: [msg1, msg2, msg3]
Partition 1: [msg4, msg5]
Partition 2: [msg6, msg7, msg8, msg9]

Each partition is its own log, stored on its own leader broker, with follower replicas elsewhere.


๐Ÿ” Want to Go Deeper?

Given your background in OS internals and distributed logs, I can also explain:

  • How partitions map to segment files on disk
  • How page cache interacts with partition reads
  • How leader/follower replication works at the protocol level
  • How partition count affects consumer group rebalancing
  • How to choose the right number of partitions for a topic
  • How partitioning interacts with exactly‑once semantics

Just tell me which direction you want to explore


Let’s go deep — this is one of those areas where your OS‑internals intuition (page cache, mmap, disk layout) gives you a huge advantage. Kafka’s storage engine is beautifully simple and mechanically sympathetic, and understanding how partitions map to segment files is the key to understanding Kafka’s performance model.

Below is the clearest, most accurate explanation you’ll find — the kind you’d use to teach a team.


๐Ÿงฉ How Kafka Partitions Map to Segment Files on Disk

A partition is stored as a directory on disk.

Inside that directory, Kafka stores the partition’s data as a sequence of segment files.


๐Ÿ“ 1. Partition → Directory

For a topic named orders with partition 2, Kafka creates:

/var/lib/kafka/data/orders-2/

This directory contains all data for that partition.


๐Ÿ“„ 2. A Partition Is Split Into Multiple Segment Files

Kafka does not store the entire partition in one huge file.

Instead, it splits the log into segments, each typically 1 GB by default.

Example contents of orders-2:

00000000000000000000.log
00000000000000000000.index
00000000000000000000.timeindex
00000000000001000000.log
00000000000001000000.index
00000000000001000000.timeindex
00000000000002000000.log
00000000000002000000.index
00000000000002000000.timeindex

Each segment is identified by its base offset.


๐Ÿงฑ 3. What’s Inside a Segment?

Each segment consists of three files:

a) .log file

The actual message data (binary records).

b) .index file

Maps relative offsets → byte positions inside the .log file.

c) .timeindex file

Maps timestamps → offsets for time‑based lookups.

Kafka uses sparse indexing, meaning it doesn’t index every message — only periodic entries.

This keeps index files tiny.


๐Ÿ” 4. How Kafka Uses Segments

Appending

Kafka always writes to the active segment (the last one).

When the segment reaches the configured size (e.g., 1 GB), Kafka:

  • closes it
  • creates a new segment with a new base offset
  • continues writing

Reading

Consumers read sequentially:

  1. Use .index to find the byte position
  2. Jump into the .log file
  3. Sequentially scan forward

This is extremely efficient because:

  • sequential disk access
  • page cache
  • OS read‑ahead
  • memory‑mapped files

⚙️ 5. Why Kafka Uses Segment Files

This design gives Kafka:

✔ Infinite log size

Old segments can be deleted or compacted without touching newer ones.

✔ Fast recovery

On restart, Kafka only loads segment metadata, not the whole log.

✔ Efficient retention

Kafka can delete entire segment files when retention is exceeded.

✔ High throughput

Sequential writes + page cache + mmap = near‑disk‑limit performance.


๐Ÿง  6. How Offsets Map to Segments

Offsets are global within a partition, not per segment.

Example:

  • Segment 0: offsets 0–999,999
  • Segment 1: offsets 1,000,000–1,999,999
  • Segment 2: offsets 2,000,000–2,999,999

Kafka finds the right segment by comparing the offset to the segment’s base offset.


๐Ÿงฉ 7. How This Relates to Replication

Each replica (leader or follower) stores its own copy of the same segment files.

Replication is simply:

  • follower fetches from leader
  • appends to its own segment files
  • updates its own indexes

This is why replication is fast and cheap.


๐Ÿ”ฅ 8. Why This Matters for You

Given your background in:

  • page cache
  • mmap
  • disk I/O
  • log‑structured storage
  • distributed messaging

…you’ll appreciate that Kafka’s design is essentially:

A distributed, replicated, append‑only log built on top of OS page cache and memory‑mapped segment files.

It’s elegant because it lets the OS do the heavy lifting.


๐Ÿš€ Want to go deeper?

I can explain:

  • How Kafka uses memory‑mapped files
  • How page cache affects throughput
  • How compaction rewrites segments
  • How index lookups work internally
  • How segment deletion interacts with consumer offsets
  • How to tune segment size for performance

Just tell me which direction you want to explore.



Let’s walk through this step by step, exactly how Kafka does it internally, using your example:

You want offset 1573.

This is a perfect example to illustrate how Kafka finds data inside a partition that is split across multiple segment files.


๐Ÿงฉ Step 1 — Kafka lists all segment base offsets

Suppose the partition directory contains these segment files:

00000000000000000000.log   (base offset = 0)
00000000000000100000.log   (base offset = 100000)
00000000000000200000.log   (base offset = 200000)

But for simplicity, let’s use smaller numbers:

Segment A → base offset 0
Segment B → base offset 1000
Segment C → base offset 2000

These base offsets are encoded in the filenames.

Kafka loads only the base offsets, not the entire file.


๐Ÿง  Step 2 — Kafka chooses the correct segment

Kafka applies this rule:

Pick the segment whose base offset is ≤ target offset, and whose next segment’s base offset is > target offset.

For offset 1573:

  • Segment A: base = 0 → 0 ≤ 1573 ✔
  • Segment B: base = 1000 → 1000 ≤ 1573 ✔
  • Segment C: base = 2000 → 2000 ≤ 1573 ✘ (too high)

So Kafka picks:

Segment B (base offset 1000)

Because:

  • 1000 ≤ 1573
  • Next segment starts at 2000, which is > 1573

This is exactly how Kafka narrows down the search.


๐Ÿ“ Step 3 — Kafka opens the .index file for Segment B

Each segment has:

1000.log
1000.index
1000.timeindex

Kafka uses the .index file to map:

relative offset → byte position in the .log file

Relative offset = target offset − base offset

So:

1573 − 1000 = 573

Kafka now needs to find relative offset 573 inside the index.


๐Ÿงญ Step 4 — Kafka uses sparse index lookup

Kafka’s index is sparse, meaning it doesn’t store every offset.

Example index entries:

relative offset → file position
0               → 0 bytes
200             → 8123 bytes
400             → 16200 bytes
600             → 24100 bytes

Kafka finds the largest relative offset ≤ 573.

That is:

400 → 16200 bytes

So Kafka jumps to byte 16200 in the .log file.


๐Ÿ“– Step 5 — Kafka reads sequentially from that byte

Now Kafka scans forward:

  • read record at offset 1400
  • read record at offset 1401
  • read record at offset 1573 ← target found

Kafka stops when it reaches the desired offset.

This sequential scan is extremely fast because:

  • the log is append‑only
  • records are stored contiguously
  • the OS page cache + read‑ahead optimize the scan

๐Ÿงจ Putting it all together

Here’s the full flow in one shot:

  1. List segment base offsets
    → 0, 1000, 2000

  2. Pick segment where base ≤ 1573 < next base
    → Segment B (base 1000)

  3. Compute relative offset
    → 1573 − 1000 = 573

  4. Use sparse index to find nearest entry ≤ 573
    → relative offset 400 → byte position 16200

  5. Jump to byte 16200 in the .log file
    → sequentially scan until offset 1573

This is exactly how Kafka performs fast lookups without loading entire files.


๐Ÿš€ If you want, I can also explain:

  • How sparse indexing is built
  • How time‑based lookups use .timeindex
  • How compaction rewrites segments while preserving offsets
  • How page cache + mmap make this blazing fast
  • How retention deletes segments without breaking offsets

Just tell me what direction you want to explore next.


Kafka Broker mmap File

 





Kafka Idempotence, Transactions, Offsets, and Exactly-Once Semantics

Kafka Idempotence, Transactions, Offsets, and Exactly-Once Semantics

Kafka Idempotence, Transactions, Offsets, and Exactly-Once Semantics

This document ties together the concepts we’ve been discussing: the idempotence protocol, sequence numbers, transactional producers, committing consumer offsets, and the classic failure scenarios that motivate Kafka’s exactly-once semantics (EOS). For a deeper background on EOS and how Kafka implements it with idempotent producers and transactions, see the Confluent article on exactly-once semantics and other EOS guides.

1. Idempotent producer: what it is and what it guarantees

1.1. Core idea

The idempotent producer is a Kafka feature that guarantees:

  • No duplicates: the same message is never written twice, even if the producer retries.
  • Ordering preserved (per partition): messages are written in the order the producer sends them.
  • Message loss is still possible: if an earlier message fails and later ones succeed, the earlier one can be dropped.

Idempotence is one of the building blocks Kafka uses to implement exactly-once semantics, but by itself it only guarantees “no duplicates + ordering,” not “no loss”.

1.2. The idempotence protocol: PID and sequence numbers

The idempotence protocol is implemented via:

  • Producer ID (PID): assigned by the broker when the producer starts. It uniquely identifies a producer instance.
  • Per-partition sequence numbers: for each partition, the producer assigns a monotonically increasing sequence number to each batch of records.
  • Broker-side tracking: the broker tracks, for each (PID, partition), the highest sequence number it has accepted.

The broker then enforces strict rules:

  • Accept new message: if incoming_seq == highest_seq + 1.
  • Accept duplicate retry: if incoming_seq == highest_seq (duplicate is ignored or safely deduplicated).
  • Reject late retry: if incoming_seq < highest_seq (too old; would break ordering and idempotence).
  • Reject gaps: if incoming_seq > highest_seq + 1 (illegal gap in sequence).
Key insight: Idempotence depends on strict ordering. The broker must reject any sequence that would violate monotonic progression, even if the application “doesn’t care” about ordering.

1.3. The seq=0 loss scenario

Consider this sequence on a single partition:

A → seq=0
B → seq=1
C → seq=2

Now imagine:

  1. A(seq=0) fails due to a transient network issue.
  2. B(seq=1) and C(seq=2) are successfully written and acknowledged.
  3. The producer retries A(seq=0) later.

At this point, the broker has:

highest_seq = 2
incoming_seq = 0

According to the idempotence rules:

  • incoming_seq < highest_seq → the broker must reject this message.

Result: A(seq=0) is lost. This is not a bug; it is the correct behavior for an idempotent producer. The broker cannot accept a late retry of an earlier sequence without breaking ordering and duplicate detection.

1.4. Why you cannot “fix” seq=0 by renumbering

A natural thought is: “If ordering doesn’t matter to my application, can I just resend A as a new message with a new sequence number (e.g., seq=3)?” The answer is no:

  • Sequence numbers are protocol metadata: they are not part of the Kafka log and are not controlled by your application.
  • The client library manages them: you cannot manually assign or override them.
  • Renumbering would break idempotence: the broker would treat the message as a new one, and you could end up with duplicates if the original write had partially succeeded.

Therefore, with idempotence alone, the seq=0 loss scenario is unavoidable in the presence of certain failure patterns.

2. Transactions: fixing the limitations of idempotence

2.1. What transactions add on top of idempotence

Kafka transactions build on the idempotent producer to provide full exactly-once semantics (EOS):

  • No duplicates (via idempotence).
  • Ordering preserved (per partition).
  • No message loss within a transaction.
  • Atomic multi-message, multi-partition, multi-topic writes.
  • Atomic commit of consumer offsets with output messages.

Transactions are enabled by configuring a transactional.id on the producer and using the transactional API: initTransactions(), beginTransaction(), commitTransaction(), and abortTransaction().

2.2. Transactional producer flow (high level)

initTransactions()
beginTransaction()

// 1. Read input (via consumer)
// 2. Process
// 3. Produce output records
// 4. Optionally send consumer offsets to the transaction

commitTransaction()   // or abortTransaction() on failure

All writes and offset commits inside a transaction are either fully committed or fully aborted. There are no partial writes visible to consumers in read_committed mode.

2.3. Why transactions eliminate the seq=0 loss scenario

Revisit the earlier example with A, B, C. With a transactional producer:

  1. The producer starts a transaction.
  2. It sends A, B, C as part of the same transaction.
  3. If A fails but B and C succeed, the transaction cannot be committed until A is successfully retried.
  4. If A never succeeds, the transaction is aborted, and none of A, B, or C become visible.

Result:

  • No partial writes.
  • No gaps like “B and C visible, A missing.”
  • No seq=0 loss within a committed transaction.

3. Committing consumer offsets and why it matters

3.1. What is a consumer offset?

A consumer offset is a “bookmark” that indicates how far a consumer has read in a partition. Committing an offset means:

“I have successfully processed all messages up to offset X. If I crash, resume from X+1.”

Kafka stores these offsets in an internal topic called __consumer_offsets. This is how Kafka tracks progress for each consumer group.

3.2. The classic failure problem: read → process → write

Consider this pipeline:

Input Topic A → Consumer → Processor → Producer → Output Topic B

And this sequence:

  1. Consumer reads message A from topic A.
  2. Processor generates output message B for topic B.
  3. Consumer commits the offset for A (marking it as processed).
  4. Producer crashes before writing B.

Result:

  • A is marked as processed and will not be re-read.
  • B was never written.
  • Message A is effectively lost.

This is the fundamental exactly-once problem: if you commit offsets before writing output, you risk loss; if you write output before committing offsets, you risk duplicates. This is why idempotence alone is not enough for end-to-end exactly-once processing.

3.3. Transactions + sendOffsetsToTransaction()

Kafka solves this by allowing the producer to include the consumer’s offsets in the same transaction as the output writes:

beginTransaction()

// 1. Consumer reads A
// 2. Processor generates B
// 3. Producer sends B to output topic
// 4. Producer calls sendOffsetsToTransaction(offset of A, consumerGroupId)

commitTransaction()   // or abortTransaction()

Now:

  • If the transaction commits:
    • B is visible in the output topic.
    • The offset for A is committed.
  • If the transaction aborts:
    • B is discarded (never visible).
    • The offset for A is not committed; A will be re-read.

This atomicity is what gives you true exactly-once semantics in a read → process → write pipeline.

4. Failure scenarios in the read → process → write pipeline

4.1. Non-transactional pipeline (unsafe)

Scenario A — Commit offset before writing output (loss)

1. Consumer reads A
2. Processor generates B
3. Consumer commits offset for A
4. Producer fails before writing B

Result:

  • A is marked processed.
  • B is never written.
  • A is lost forever.

Scenario B — Write output before committing offset (duplicates)

1. Consumer reads A
2. Processor generates B
3. Producer writes B
4. Consumer crashes before committing offset for A
5. Consumer restarts and re-reads A
6. Processor generates B again
7. Producer writes B again

Result:

  • A is processed twice.
  • B is written twice (duplicate output).

Without transactions, you are forced to choose between possible loss or possible duplicates. You cannot avoid both simultaneously.

4.2. Transactional pipeline (safe)

Scenario C — Transaction commits successfully

1. beginTransaction()
2. Consumer reads A
3. Processor generates B
4. Producer writes B
5. Producer sendOffsetsToTransaction(offset of A)
6. commitTransaction()

Result:

  • B is visible in the output topic.
  • Offset for A is committed.
  • A is processed exactly once.
  • No duplicates, no loss.

Scenario D — Failure before commit → transaction aborts

1. beginTransaction()
2. Consumer reads A
3. Processor generates B
4. Producer writes B (pending, not visible)
5. Producer or broker fails before commitTransaction()
6. Transaction coordinator aborts the transaction

Result:

  • B is discarded (never visible to consumers in read_committed mode).
  • Offset for A is not committed.
  • Consumer will re-read A.
  • No duplicates, no loss.

4.3. Summary table: non-transactional vs transactional

Aspect Non-Transactional (Idempotence Only) Transactional (Exactly-Once)
No duplicates Yes (for producer retries), but not end-to-end Yes, end-to-end
Ordering preserved (per partition) Yes Yes
No message loss No (seq=0 loss, offset-commit race) Yes (within transactions)
Atomic write + offset commit No Yes (via sendOffsetsToTransaction)
Exactly-once semantics No Yes

5. Transaction coordinator and broker crash behavior (high level)

5.1. Transaction coordinator state machine (conceptual)

The transaction coordinator manages transactional state for producers. Conceptually, it moves through states like:

  • NO_TXN / READY: no active transaction; producer has a valid PID + epoch.
  • ONGOING: a transaction is in progress; messages are written as pending/uncommitted.
  • PREPARE_COMMIT: coordinator writes a prepare-commit marker to the transaction log.
  • COMPLETE_COMMIT: coordinator writes a commit marker; messages become visible.
  • PREPARE_ABORT / COMPLETE_ABORT: similar flow for abort; pending messages are discarded.

On failures (producer crash, timeouts, fencing), the coordinator will typically move the transaction to ABORT, ensuring no partial writes are visible.

5.2. Broker crash and recovery

When a broker crashes and restarts, it:

  • Replays log segments to restore committed data and high watermarks.
  • Replays the transaction log to determine the state of in-flight transactions.
  • For each transaction:
    • If a prepare-commit marker is present but no final commit, it completes the commit.
    • If a prepare-abort marker or incomplete transaction is found, it aborts the transaction.

This ensures that after recovery, the broker presents a consistent view: no partial transactions, no duplicates, and no reordering for committed data.

6. The core use case: exactly-once processing, not “reading A and B in order”

In the canonical pipeline:

Input Topic A → Consumer → Processor → Producer → Output Topic B

The goal is not for a consumer to read both A and B in order. Typically:

  • One consumer group reads A from the input topic.
  • Another consumer group reads B from the output topic.

The real goals are:

  • Every A is processed exactly once.
  • Every A produces exactly one corresponding B (if that’s your business logic).
  • No A is lost.
  • No B is duplicated.

Committing consumer offsets inside the same transaction as writing B is what guarantees that you never end up in the “A marked processed but B never written” or “B written twice because A was re-read” situations. This is the essence of Kafka’s exactly-once semantics.

Neeraj, this is a **fantastic** question — and you’re now pushing into the *deepest* part of Kafka’s idempotence protocol: **What happens if seq‑0 is lost, seq‑1 and seq‑2 succeed, and THEN the producer crashes and restarts? Can the new producer session send seq‑0 again?** Let’s walk through this carefully and precisely. --- # ⭐ **Short Answer** **No — even after the producer crashes and restarts, Kafka will NOT accept seq‑0. It is permanently lost.** And the reason is subtle but absolutely fundamental: > **A new producer session gets a new Producer ID (PID) and resets sequence numbers to 0, > but the broker will NOT treat the old seq‑0 retry as part of the new session.** Let’s break this down step by step. --- # ๐Ÿงฉ **Step 1 — Original Producer Session** Producer sends: ``` A → seq=0 (fails) B → seq=1 (success) C → seq=2 (success) ``` Broker state for this PID: ``` PID = 123 highest_seq = 2 ``` Retry of seq‑0 arrives: ``` incoming_seq = 0 highest_seq = 2 → reject (late retry) ``` So **seq‑0 is lost**. --- # ๐Ÿงฉ **Step 2 — Producer Crashes** Producer process dies. This is important: - The PID is tied to the old producer session. - The broker keeps the PID state for a while (until it expires). --- # ๐Ÿงฉ **Step 3 — Producer Restarts** When the producer restarts, Kafka assigns: ``` NEW PID = 456 NEW epoch = 0 Sequence numbers reset to 0 ``` This is a **completely new identity**. The new producer session **cannot reuse** the old PID or old sequence numbers. --- # ๐Ÿงฉ **Step 4 — Can the new producer send the old seq‑0 message?** Here’s the key: ### ✔ The new producer *can* send a message with sequence number 0 ### ❌ But Kafka will treat it as a **new message**, not the old seq‑0 retry Why? Because: - The PID is different - The epoch is different - The broker’s sequence tracking is per **PID + partition** So the broker sees: ``` Old PID = 123 → highest_seq = 2 New PID = 456 → highest_seq = -1 (no messages yet) ``` So if the new producer sends: ``` A' → seq=0 ``` Kafka accepts it — **but it is NOT the original seq‑0**. It is a **new message**, unrelated to the lost one. --- # ๐ŸŸฅ **Critical Insight** > **Kafka cannot resurrect the original seq‑0 message after the producer restarts. > The original seq‑0 is permanently lost.** Even though the new producer can send a message with seq=0, it is: - A different PID - A different session - A different sequence space - A different message Kafka has no way to know it is “the same” message. --- # ๐ŸŸฆ **Why Kafka Cannot Accept the Old seq‑0 After Restart** Because doing so would break: - Duplicate detection - Ordering guarantees - Zombie producer protection - Idempotence protocol correctness Imagine if Kafka allowed this: ``` Old PID 123: seq=0 lost New PID 456: seq=0 accepted ``` Kafka would have no way to know: - Is this a retry of the old message? - Or is it a new message? - Should it dedupe? - Should it reorder? - Should it reject? This ambiguity is fatal for correctness. So Kafka’s rule is: > **A new producer session starts a new sequence space. > Old messages cannot be retried.** --- # ๐ŸŸฉ **Timeline Summary** ### **Original session** ``` PID=123 seq=0 → lost seq=1 → committed seq=2 → committed ``` ### **Producer crashes** ### **New session** ``` PID=456 seq numbers reset seq=0 → accepted (but treated as a NEW message) ``` ### **Conclusion** The original seq‑0 is gone forever. --- # ⭐ **Final Answer** ### ❌ No — Kafka will NOT accept the original seq‑0 after the producer restarts. ### ✔ The new producer can send a new message with seq=0, but it is NOT the same message. ### ✔ The original seq‑0 is permanently lost once seq‑1 and seq‑2 are committed. ### ✔ This is why idempotence alone cannot guarantee no message loss. ### ✔ Transactions are required if you want to prevent this scenario. --- If you want, I can also draw: - A timeline diagram of this exact scenario - A state machine showing PID rollover - A comparison table of idempotence vs transactions for this case Just tell me what would help you think through it next.

7. Quick mental model recap

  • Idempotence protocol: PID + per-partition sequence numbers; guarantees no duplicates and ordering, but can drop messages (e.g., seq=0) in certain failure patterns.
  • Transactions: build on idempotence to add atomic commit/abort, no loss, and exactly-once semantics across reads, writes, and offset commits.
  • Committing consumer offsets: is how Kafka tracks progress; doing it inside a transaction with output writes is how you avoid both loss and duplicates.
  • Exactly-once semantics: achieved by combining idempotent producers, transactional writes, and atomic offset commits.

Wednesday, April 16, 2025

Kafka Consumer Fetch Configs

Kafka Consumer Fetch Settings - Interview Prep

Kafka Consumer Fetch Tuning - Interview Guide

Purpose of Kafka Consumer Group -> To paralleize the message consumption and achieve fault tolerance.

⚙️ Core Settings

  • fetch.min.bytes – Minimum data size (bytes) broker should collect before replying to consumer.
  • fetch.max.wait.ms – Maximum time broker will wait to meet the fetch.min.bytes requirement.
  • fetch.max.bytes – Maximum bytes the consumer is willing to receive in a single fetch response.

Kafka sends a fetch response if it has ≥ fetch.min.bytes data available OR if fetch.max.wait.ms expires.

fetch.max.bytes is the hard cap on total response size per fetch from all partitions.

๐Ÿ“ˆ Example Scenario

fetch.min.bytes = 1000
fetch.max.wait.ms = 3000
fetch.max.bytes = 1500

✔ Message A = 800 bytes
✔ Message B = 900 bytes

Case:
- Broker receives Message A → waits for Message B
- A+B = 1700 bytes → exceeds fetch.max.bytes
- Kafka will only return Message A in this fetch cycle

๐Ÿง  Diagram - Fetch Behavior Matrix

+----------------------+--------------------------+----------------------+---------------------+ | Scenario | fetch.min.bytes | fetch.max.wait.ms | fetch.max.bytes | +----------------------+--------------------------+----------------------+---------------------+ | Total data = 400 | < 1000 → wait | Waits up to 3s | Not exceeded | | Total data = 1200 | >= min.bytes → ✅ send | Immediately | Not exceeded | | Message = 2000 | > max.bytes → ❌ skipped | Will not be fetched | Must raise config | | Total = 1600 | >= min.bytes | Time ok | Too large → partial | +----------------------+--------------------------+----------------------+---------------------+

๐ŸŽฏ Interview Questions & Answers

Q1. What does fetch.max.bytes do in Kafka consumer configuration?

It defines the maximum number of bytes a consumer can receive in a single fetch response. It is a hard cap across all partitions in a fetch cycle.

Q2. What happens if a record is larger than fetch.max.bytes?

Kafka will not return that record in the fetch response. The message is effectively "skipped" until fetch.max.bytes is increased to accommodate it.

Q3. Will Kafka throw an exception if a message is too large for fetch.max.bytes?

No. It silently skips the record. The consumer just keeps polling and receives nothing.

Q4. What if multiple messages together exceed fetch.max.bytes?

Kafka includes only as many messages as it can fit within the limit. The rest will be delivered in the next fetch call.

Q5. How does fetch.max.bytes differ from max.partition.fetch.bytes?

  • fetch.max.bytes – Total limit across all partitions
  • max.partition.fetch.bytes – Limit per partition
  • Both must be adjusted if your message size increases

๐Ÿ“˜ Bonus: Java Config Snippet

Properties props = new Properties();
props.put("fetch.min.bytes", "1000");
props.put("fetch.max.wait.ms", "3000");
props.put("fetch.max.bytes", "1500");

KafkaConsumer consumer = new KafkaConsumer<>(props);
ConsumerRecords records = consumer.poll(Duration.ofMillis(5000));

if (records.isEmpty()) {
    System.out.println("No records fetched — possibly due to fetch limits.");
}

๐Ÿ’ก Interview Tips

  • Understand that these fetch configs optimize between latency and throughput.
  • Mention how fetch.max.bytes is critical for avoiding starvation with large messages.
  • Be ready to talk about how fetch.max.bytes and max.partition.fetch.bytes should be tuned together.

Monday, April 14, 2025

Kafka endOffset

endOffsets()

๐Ÿ”ข 1. Understanding endOffsets()

When you call:

long endOffset = consumer.endOffsets(List.of(tp)).get(tp);

You get the next offset after the last available message in that partition.

Think of endOffset as a marker:
“If a new message arrives, this is the offset it will be assigned.”


๐Ÿง  Example:

Message Offset
A 0
B 1
C 2
  • endOffset = 3 (no message at offset 3 yet)

  • So to read last message, we must seek to endOffset - 1 = 2


๐Ÿ” Why Not Use endOffset Directly?

If you call:

consumer.seek(tp, endOffset); // BAD!

It will position the consumer after the last message, and poll() will return nothing, because there's nothing at that offset yet.


✅ Summary

Offset Meaning
currentOffset Where the next poll() will start
endOffset The next available offset, NOT the last existing record
endOffset - 1 Points to the actual last message in the topic/partition

๐Ÿง  Interview Line:

“Kafka offsets are forward-pointing. The endOffset indicates where the next message will be written, not where the last message lives. So to retrieve the last message, we seek to endOffset - 1.”


Let me know if you want this as a diagram or visual timeline — it's perfect for interview slides.

Saturday, April 12, 2025

Kafka Group Coordinator

Kafka Group Coordinator

Kafka Group Coordinator – Interview Perspective

Interview Question:
“What is the Group Coordinator in Kafka? How is it selected, what are its responsibilities, and where can we find its code in Kafka’s source?”

1. What is a Group Coordinator?

The Group Coordinator in Kafka is a designated Kafka broker responsible for managing a consumer group. It handles consumer group membership, rebalancing, and offset commits for the group.

2. How is the Group Coordinator Selected?

  • Every broker in Kafka is capable of being a group coordinator.
  • When a consumer joins a group, it sends a FindCoordinatorRequest to any broker.
  • That broker uses a consistent hashing strategy on the group.id to determine which broker will be the Group Coordinator for that group.
  • The broker holding the partition for the internal topic __consumer_offsets relevant to that group.id becomes the Group Coordinator.

3. Responsibilities of the Group Coordinator

The Group Coordinator performs several key functions:

  • Manage Consumer Membership: Tracks all consumers in a group.
  • Coordinate Rebalances: When consumers join/leave, it triggers a rebalance and assigns partitions.
  • Assign Leader: Appoints one consumer as the group leader to help coordinate partition assignment.
  • Track Offsets: Stores committed offsets in the __consumer_offsets topic.
  • Handle Heartbeats: Keeps track of active consumers to detect failures quickly.

4. Services Provided by the Group Coordinator

It acts as a centralized service point for:

  • Tracking active consumers in the group
  • Reassigning partitions during consumer join/leave events
  • Persisting committed offsets
  • Coordinating heartbeat checks and group timeouts

5. Internal Communication

  • Consumers communicate with the group coordinator using RPC-style requests:
    • JoinGroupRequest
    • SyncGroupRequest
    • HeartbeatRequest
    • LeaveGroupRequest
    • OffsetCommitRequest

6. Where Is This in Kafka’s Source Code?

The core logic for group coordination lives in the Kafka broker codebase:

  • Class: GroupCoordinator
  • Location: core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala
  • Handles all consumer group membership and offset commit logic.
  • Works closely with the __consumer_offsets internal topic.

Additional Code References:

  • kafka.server.KafkaApis – entry point that handles requests from consumers
  • FindCoordinatorRequest – initiates coordinator discovery
  • GroupMetadataManager – helps manage group metadata and offset storage

8. What If There Are Multiple Consumer Groups?

Kafka is designed to handle many consumer groups efficiently. Each consumer group is managed independently by potentially different group coordinators, depending on the group ID.

How Kafka Handles Multiple Groups:

  • Each group.id is mapped (via hashing) to a broker using the __consumer_offsets topic.
  • This ensures that different consumer groups can be coordinated by different brokers for load distribution.
  • Even if multiple groups consume from the same topic, they operate independently and can have different offsets and members.

Example Scenario:

  • Group A might be coordinated by Broker 1.
  • Group B might be coordinated by Broker 2.
  • Both could be consuming from the same topic orders, but maintain their own committed offsets.

Advantages of This Design:

  • Isolation: Each group is logically separated — no interference or data duplication.
  • Scalability: Multiple brokers help distribute coordination load.
  • Flexibility: You can have many groups with different configurations and processing logic.
Interview Summary:
“Kafka supports multiple consumer groups natively. Each group is handled independently, and group coordinators are selected using the group ID’s hash. This ensures that different groups are isolated in terms of membership and offset tracking, even when reading from the same topic. It's a powerful feature that enables multi-team, multi-application consumption from a single Kafka topic.”

7. Summary (Interview Style)

“The Group Coordinator is a Kafka broker responsible for managing a specific consumer group. It is selected via consistent hashing based on the group ID and coordinates group membership, partition assignments, and offset tracking. It plays a critical role in ensuring consumer groups are balanced and fault-tolerant. You can find the implementation in GroupCoordinator.scala within the Kafka broker code.”

Kafka Brokers vs Partitions

Kafka Brokers vs Partitions - Interview Perspective

Kafka Brokers vs Partitions

1. What is a Kafka Partition?

  • A Kafka partition is a unit of parallelism within a topic.
  • Each partition is an ordered, immutable sequence of records (like a log file).
  • Partitions allow Kafka to scale horizontally by distributing data across consumers.
  • Each partition has exactly one leader and zero or more replicas.

2. What is a Kafka Broker?

  • A Kafka broker is a Kafka server instance.
  • A Kafka cluster consists of multiple brokers (usually 3+ for production).
  • Each broker is responsible for storing and serving partitions.
  • One broker in the cluster also acts as the Group Coordinator for consumer group management.

3. Key Differences Between Broker and Partition

Kafka Broker Kafka Partition
A Kafka server (process running in the cluster) A data structure (log) inside a topic
Stores and manages partitions Contains the actual messages (data)
Communicates with producers and consumers Used for parallelism in writing/reading data
Can act as leader or follower for partitions Has a single leader, rest are replicas

4. How Are They Related?

Partitions are stored on brokers. Kafka distributes partitions across brokers for:

  • Scalability: More brokers allow more partitions and greater throughput.
  • Fault Tolerance: Replicas of partitions are kept on different brokers.
  • Load Balancing: Kafka tries to evenly spread partitions among brokers.

5. Common Misunderstanding (and Clarification)

It's common to think: "More partitions = More brokers." While more partitions may require more brokers for performance, it's not a strict 1:1 requirement.While Kafka partitions and brokers are related, multiple brokers do not exist simply because there are multiple partitions.They serve different roles in Kafka’s architecture.

You can have:

  • 1 broker with many partitions (not scalable or fault tolerant)
  • Multiple brokers with a few partitions (more scalable and resilient)

Conclusion (Interview Style):
Multiple brokers don't exist just because there are multiple partitions. Instead, Kafka allows topics to be split into partitions for horizontal scaling, and those partitions are distributed across brokers. Brokers are essential for scalability and fault tolerance, while partitions enable parallel processing. Together, they form Kafka's foundation for high throughput and distributed messaging.

Kafka Partition

๐Ÿงฉ What Exactly Is a Partition in Kafka? A partition is the fundamental unit of storage, parallelism, and scalability in Kafka. Think of ...