🧩 What Exactly Is a Partition in Kafka?

A partition is the fundamental unit of storage, parallelism, and scalability in Kafka.

Think of a Kafka topic as a folder, and partitions as the files inside that folder.

Each partition is:

An ordered, append‑only log
Stored on a single broker (leader replica)
Replicated to other brokers (follower replicas)
The unit of parallelism for producers and consumers
The unit of fault tolerance (via replication)

📌 A Partition Is an Ordered Log

Inside a partition, messages are stored in strict order:

offset 0 → offset 1 → offset 2 → offset 3 → ...

Kafka guarantees ordering only within a partition, not across partitions.

This is why key‑based partitioning matters:

all messages with the same key go to the same partition → ordering preserved.

📦 A Topic Is Split Into Multiple Partitions

Example:

topic: payments
partitions: 3

You get:

payments‑0
payments‑1
payments‑2

Each is an independent log.

🔁 Replication Happens Per Partition

If replication factor = 3, then each partition has 3 copies:

payments-0 → leader on broker 1, followers on broker 2 and 3
payments-1 → leader on broker 2, followers on broker 3 and 1
payments-2 → leader on broker 3, followers on broker 1 and 2

This is why we say:

“A replication factor of 3 means each partition has 3 replicas.”

Not the topic — the partition.

⚙️ Why Partitions Matter

1. Scalability

More partitions → more parallelism.

Producers can write in parallel
Consumers in a group can read in parallel
Throughput increases linearly with partitions

2. Fault Tolerance

Replication is done per partition, so Kafka can survive broker failures.

3. Ordering Guarantees

Kafka only guarantees ordering within a partition, not across the topic.

🧠 A Simple Visual

Topic: orders
Partition 0: [msg1, msg2, msg3]
Partition 1: [msg4, msg5]
Partition 2: [msg6, msg7, msg8, msg9]

Each partition is its own log, stored on its own leader broker, with follower replicas elsewhere.

🔍 Want to Go Deeper?

Given your background in OS internals and distributed logs, I can also explain:

How partitions map to segment files on disk
How page cache interacts with partition reads
How leader/follower replication works at the protocol level
How partition count affects consumer group rebalancing
How to choose the right number of partitions for a topic
How partitioning interacts with exactly‑once semantics

Just tell me which direction you want to explore

Let’s go deep — this is one of those areas where your OS‑internals intuition (page cache, mmap, disk layout) gives you a huge advantage. Kafka’s storage engine is beautifully simple and mechanically sympathetic, and understanding how partitions map to segment files is the key to understanding Kafka’s performance model.

Below is the clearest, most accurate explanation you’ll find — the kind you’d use to teach a team.

🧩 How Kafka Partitions Map to Segment Files on Disk

A partition is stored as a directory on disk.

Inside that directory, Kafka stores the partition’s data as a sequence of segment files.

📁 1. Partition → Directory

For a topic named orders with partition 2, Kafka creates:

/var/lib/kafka/data/orders-2/

This directory contains all data for that partition.

📄 2. A Partition Is Split Into Multiple Segment Files

Kafka does not store the entire partition in one huge file.

Instead, it splits the log into segments, each typically 1 GB by default.

Example contents of orders-2:

00000000000000000000.log
00000000000000000000.index
00000000000000000000.timeindex
00000000000001000000.log
00000000000001000000.index
00000000000001000000.timeindex
00000000000002000000.log
00000000000002000000.index
00000000000002000000.timeindex

Each segment is identified by its base offset.

🧱 3. What’s Inside a Segment?

Each segment consists of three files:

a) `.log` file

The actual message data (binary records).

b) `.index` file

Maps relative offsets → byte positions inside the .log file.

c) `.timeindex` file

Maps timestamps → offsets for time‑based lookups.

Kafka uses sparse indexing, meaning it doesn’t index every message — only periodic entries.

This keeps index files tiny.

🔍 4. How Kafka Uses Segments

Appending

Kafka always writes to the active segment (the last one).

When the segment reaches the configured size (e.g., 1 GB), Kafka:

closes it
creates a new segment with a new base offset
continues writing

Reading

Consumers read sequentially:

Use .index to find the byte position
Jump into the .log file
Sequentially scan forward

This is extremely efficient because:

sequential disk access
page cache
OS read‑ahead
memory‑mapped files

⚙️ 5. Why Kafka Uses Segment Files

This design gives Kafka:

✔ Infinite log size

Old segments can be deleted or compacted without touching newer ones.

✔ Fast recovery

On restart, Kafka only loads segment metadata, not the whole log.

✔ Efficient retention

Kafka can delete entire segment files when retention is exceeded.

✔ High throughput

Sequential writes + page cache + mmap = near‑disk‑limit performance.

🧠 6. How Offsets Map to Segments

Offsets are global within a partition, not per segment.

Example:

Segment 0: offsets 0–999,999
Segment 1: offsets 1,000,000–1,999,999
Segment 2: offsets 2,000,000–2,999,999

Kafka finds the right segment by comparing the offset to the segment’s base offset.

🧩 7. How This Relates to Replication

Each replica (leader or follower) stores its own copy of the same segment files.

Replication is simply:

follower fetches from leader
appends to its own segment files
updates its own indexes

This is why replication is fast and cheap.

🔥 8. Why This Matters for You

Given your background in:

page cache
mmap
disk I/O
log‑structured storage
distributed messaging

…you’ll appreciate that Kafka’s design is essentially:

A distributed, replicated, append‑only log built on top of OS page cache and memory‑mapped segment files.

It’s elegant because it lets the OS do the heavy lifting.

🚀 Want to go deeper?

I can explain:

How Kafka uses memory‑mapped files
How page cache affects throughput
How compaction rewrites segments
How index lookups work internally
How segment deletion interacts with consumer offsets
How to tune segment size for performance

Just tell me which direction you want to explore.

Let’s walk through this step by step, exactly how Kafka does it internally, using your example:

You want offset 1573.

This is a perfect example to illustrate how Kafka finds data inside a partition that is split across multiple segment files.

🧩 Step 1 — Kafka lists all segment base offsets

Suppose the partition directory contains these segment files:

00000000000000000000.log   (base offset = 0)
00000000000000100000.log   (base offset = 100000)
00000000000000200000.log   (base offset = 200000)

But for simplicity, let’s use smaller numbers:

Segment A → base offset 0
Segment B → base offset 1000
Segment C → base offset 2000

These base offsets are encoded in the filenames.

Kafka loads only the base offsets, not the entire file.

🧠 Step 2 — Kafka chooses the correct segment

Kafka applies this rule:

Pick the segment whose base offset is ≤ target offset, and whose next segment’s base offset is > target offset.

For offset 1573:

Segment A: base = 0 → 0 ≤ 1573 ✔
Segment B: base = 1000 → 1000 ≤ 1573 ✔
Segment C: base = 2000 → 2000 ≤ 1573 ✘ (too high)

So Kafka picks:

✅ Segment B (base offset 1000)

Because:

1000 ≤ 1573
Next segment starts at 2000, which is > 1573

This is exactly how Kafka narrows down the search.

📁 Step 3 — Kafka opens the `.index` file for Segment B

Each segment has:

1000.log
1000.index
1000.timeindex

Kafka uses the .index file to map:

relative offset → byte position in the .log file

Relative offset = target offset − base offset

So:

1573 − 1000 = 573

Kafka now needs to find relative offset 573 inside the index.

🧭 Step 4 — Kafka uses sparse index lookup

Kafka’s index is sparse, meaning it doesn’t store every offset.

Example index entries:

relative offset → file position
0               → 0 bytes
200             → 8123 bytes
400             → 16200 bytes
600             → 24100 bytes

Kafka finds the largest relative offset ≤ 573.

That is:

400 → 16200 bytes

So Kafka jumps to byte 16200 in the .log file.

📖 Step 5 — Kafka reads sequentially from that byte

Now Kafka scans forward:

read record at offset 1400
read record at offset 1401
…
read record at offset 1573 ← target found

Kafka stops when it reaches the desired offset.

This sequential scan is extremely fast because:

the log is append‑only
records are stored contiguously
the OS page cache + read‑ahead optimize the scan

🧨 Putting it all together

Here’s the full flow in one shot:

List segment base offsets
→ 0, 1000, 2000
Pick segment where base ≤ 1573 < next base
→ Segment B (base 1000)
Compute relative offset
→ 1573 − 1000 = 573
Use sparse index to find nearest entry ≤ 573
→ relative offset 400 → byte position 16200
Jump to byte 16200 in the .log file
→ sequentially scan until offset 1573

This is exactly how Kafka performs fast lookups without loading entire files.

🚀 If you want, I can also explain:

How sparse indexing is built
How time‑based lookups use .timeindex
How compaction rewrites segments while preserving offsets
How page cache + mmap make this blazing fast
How retention deletes segments without breaking offsets

Just tell me what direction you want to explore next.

Saturday, January 3, 2026

Kafka Partition