Saturday, January 3, 2026

Kafka Partition

๐Ÿงฉ What Exactly Is a Partition in Kafka?

A partition is the fundamental unit of storage, parallelism, and scalability in Kafka.

Think of a Kafka topic as a folder, and partitions as the files inside that folder.
Each partition is:

  • An ordered, append‑only log
  • Stored on a single broker (leader replica)
  • Replicated to other brokers (follower replicas)
  • The unit of parallelism for producers and consumers
  • The unit of fault tolerance (via replication)

๐Ÿ“Œ A Partition Is an Ordered Log

Inside a partition, messages are stored in strict order:

offset 0 → offset 1 → offset 2 → offset 3 → ...

Kafka guarantees ordering only within a partition, not across partitions.

This is why key‑based partitioning matters:
all messages with the same key go to the same partition → ordering preserved.


๐Ÿ“ฆ A Topic Is Split Into Multiple Partitions

Example:

topic: payments
partitions: 3

You get:

  • payments‑0
  • payments‑1
  • payments‑2

Each is an independent log.


๐Ÿ” Replication Happens Per Partition

If replication factor = 3, then each partition has 3 copies:

payments-0 → leader on broker 1, followers on broker 2 and 3
payments-1 → leader on broker 2, followers on broker 3 and 1
payments-2 → leader on broker 3, followers on broker 1 and 2

This is why we say:

“A replication factor of 3 means each partition has 3 replicas.”

Not the topic — the partition.


⚙️ Why Partitions Matter

1. Scalability

More partitions → more parallelism.

  • Producers can write in parallel
  • Consumers in a group can read in parallel
  • Throughput increases linearly with partitions

2. Fault Tolerance

Replication is done per partition, so Kafka can survive broker failures.

3. Ordering Guarantees

Kafka only guarantees ordering within a partition, not across the topic.


๐Ÿง  A Simple Visual

Topic: orders
Partition 0: [msg1, msg2, msg3]
Partition 1: [msg4, msg5]
Partition 2: [msg6, msg7, msg8, msg9]

Each partition is its own log, stored on its own leader broker, with follower replicas elsewhere.


๐Ÿ” Want to Go Deeper?

Given your background in OS internals and distributed logs, I can also explain:

  • How partitions map to segment files on disk
  • How page cache interacts with partition reads
  • How leader/follower replication works at the protocol level
  • How partition count affects consumer group rebalancing
  • How to choose the right number of partitions for a topic
  • How partitioning interacts with exactly‑once semantics

Just tell me which direction you want to explore


Let’s go deep — this is one of those areas where your OS‑internals intuition (page cache, mmap, disk layout) gives you a huge advantage. Kafka’s storage engine is beautifully simple and mechanically sympathetic, and understanding how partitions map to segment files is the key to understanding Kafka’s performance model.

Below is the clearest, most accurate explanation you’ll find — the kind you’d use to teach a team.


๐Ÿงฉ How Kafka Partitions Map to Segment Files on Disk

A partition is stored as a directory on disk.

Inside that directory, Kafka stores the partition’s data as a sequence of segment files.


๐Ÿ“ 1. Partition → Directory

For a topic named orders with partition 2, Kafka creates:

/var/lib/kafka/data/orders-2/

This directory contains all data for that partition.


๐Ÿ“„ 2. A Partition Is Split Into Multiple Segment Files

Kafka does not store the entire partition in one huge file.

Instead, it splits the log into segments, each typically 1 GB by default.

Example contents of orders-2:

00000000000000000000.log
00000000000000000000.index
00000000000000000000.timeindex
00000000000001000000.log
00000000000001000000.index
00000000000001000000.timeindex
00000000000002000000.log
00000000000002000000.index
00000000000002000000.timeindex

Each segment is identified by its base offset.


๐Ÿงฑ 3. What’s Inside a Segment?

Each segment consists of three files:

a) .log file

The actual message data (binary records).

b) .index file

Maps relative offsets → byte positions inside the .log file.

c) .timeindex file

Maps timestamps → offsets for time‑based lookups.

Kafka uses sparse indexing, meaning it doesn’t index every message — only periodic entries.

This keeps index files tiny.


๐Ÿ” 4. How Kafka Uses Segments

Appending

Kafka always writes to the active segment (the last one).

When the segment reaches the configured size (e.g., 1 GB), Kafka:

  • closes it
  • creates a new segment with a new base offset
  • continues writing

Reading

Consumers read sequentially:

  1. Use .index to find the byte position
  2. Jump into the .log file
  3. Sequentially scan forward

This is extremely efficient because:

  • sequential disk access
  • page cache
  • OS read‑ahead
  • memory‑mapped files

⚙️ 5. Why Kafka Uses Segment Files

This design gives Kafka:

✔ Infinite log size

Old segments can be deleted or compacted without touching newer ones.

✔ Fast recovery

On restart, Kafka only loads segment metadata, not the whole log.

✔ Efficient retention

Kafka can delete entire segment files when retention is exceeded.

✔ High throughput

Sequential writes + page cache + mmap = near‑disk‑limit performance.


๐Ÿง  6. How Offsets Map to Segments

Offsets are global within a partition, not per segment.

Example:

  • Segment 0: offsets 0–999,999
  • Segment 1: offsets 1,000,000–1,999,999
  • Segment 2: offsets 2,000,000–2,999,999

Kafka finds the right segment by comparing the offset to the segment’s base offset.


๐Ÿงฉ 7. How This Relates to Replication

Each replica (leader or follower) stores its own copy of the same segment files.

Replication is simply:

  • follower fetches from leader
  • appends to its own segment files
  • updates its own indexes

This is why replication is fast and cheap.


๐Ÿ”ฅ 8. Why This Matters for You

Given your background in:

  • page cache
  • mmap
  • disk I/O
  • log‑structured storage
  • distributed messaging

…you’ll appreciate that Kafka’s design is essentially:

A distributed, replicated, append‑only log built on top of OS page cache and memory‑mapped segment files.

It’s elegant because it lets the OS do the heavy lifting.


๐Ÿš€ Want to go deeper?

I can explain:

  • How Kafka uses memory‑mapped files
  • How page cache affects throughput
  • How compaction rewrites segments
  • How index lookups work internally
  • How segment deletion interacts with consumer offsets
  • How to tune segment size for performance

Just tell me which direction you want to explore.



Let’s walk through this step by step, exactly how Kafka does it internally, using your example:

You want offset 1573.

This is a perfect example to illustrate how Kafka finds data inside a partition that is split across multiple segment files.


๐Ÿงฉ Step 1 — Kafka lists all segment base offsets

Suppose the partition directory contains these segment files:

00000000000000000000.log   (base offset = 0)
00000000000000100000.log   (base offset = 100000)
00000000000000200000.log   (base offset = 200000)

But for simplicity, let’s use smaller numbers:

Segment A → base offset 0
Segment B → base offset 1000
Segment C → base offset 2000

These base offsets are encoded in the filenames.

Kafka loads only the base offsets, not the entire file.


๐Ÿง  Step 2 — Kafka chooses the correct segment

Kafka applies this rule:

Pick the segment whose base offset is ≤ target offset, and whose next segment’s base offset is > target offset.

For offset 1573:

  • Segment A: base = 0 → 0 ≤ 1573 ✔
  • Segment B: base = 1000 → 1000 ≤ 1573 ✔
  • Segment C: base = 2000 → 2000 ≤ 1573 ✘ (too high)

So Kafka picks:

Segment B (base offset 1000)

Because:

  • 1000 ≤ 1573
  • Next segment starts at 2000, which is > 1573

This is exactly how Kafka narrows down the search.


๐Ÿ“ Step 3 — Kafka opens the .index file for Segment B

Each segment has:

1000.log
1000.index
1000.timeindex

Kafka uses the .index file to map:

relative offset → byte position in the .log file

Relative offset = target offset − base offset

So:

1573 − 1000 = 573

Kafka now needs to find relative offset 573 inside the index.


๐Ÿงญ Step 4 — Kafka uses sparse index lookup

Kafka’s index is sparse, meaning it doesn’t store every offset.

Example index entries:

relative offset → file position
0               → 0 bytes
200             → 8123 bytes
400             → 16200 bytes
600             → 24100 bytes

Kafka finds the largest relative offset ≤ 573.

That is:

400 → 16200 bytes

So Kafka jumps to byte 16200 in the .log file.


๐Ÿ“– Step 5 — Kafka reads sequentially from that byte

Now Kafka scans forward:

  • read record at offset 1400
  • read record at offset 1401
  • read record at offset 1573 ← target found

Kafka stops when it reaches the desired offset.

This sequential scan is extremely fast because:

  • the log is append‑only
  • records are stored contiguously
  • the OS page cache + read‑ahead optimize the scan

๐Ÿงจ Putting it all together

Here’s the full flow in one shot:

  1. List segment base offsets
    → 0, 1000, 2000

  2. Pick segment where base ≤ 1573 < next base
    → Segment B (base 1000)

  3. Compute relative offset
    → 1573 − 1000 = 573

  4. Use sparse index to find nearest entry ≤ 573
    → relative offset 400 → byte position 16200

  5. Jump to byte 16200 in the .log file
    → sequentially scan until offset 1573

This is exactly how Kafka performs fast lookups without loading entire files.


๐Ÿš€ If you want, I can also explain:

  • How sparse indexing is built
  • How time‑based lookups use .timeindex
  • How compaction rewrites segments while preserving offsets
  • How page cache + mmap make this blazing fast
  • How retention deletes segments without breaking offsets

Just tell me what direction you want to explore next.


No comments:

Post a Comment

Kafka Partition

๐Ÿงฉ What Exactly Is a Partition in Kafka? A partition is the fundamental unit of storage, parallelism, and scalability in Kafka. Think of ...