Wednesday, April 16, 2025

Kafka Consumer Fetch Configs

Kafka Consumer Fetch Settings - Interview Prep

Kafka Consumer Fetch Tuning - Interview Guide

Purpose of Kafka Consumer Group -> To paralleize the message consumption and achieve fault tolerance.

⚙️ Core Settings

  • fetch.min.bytes – Minimum data size (bytes) broker should collect before replying to consumer.
  • fetch.max.wait.ms – Maximum time broker will wait to meet the fetch.min.bytes requirement.
  • fetch.max.bytes – Maximum bytes the consumer is willing to receive in a single fetch response.

Kafka sends a fetch response if it has ≥ fetch.min.bytes data available OR if fetch.max.wait.ms expires.

fetch.max.bytes is the hard cap on total response size per fetch from all partitions.

📈 Example Scenario

fetch.min.bytes = 1000
fetch.max.wait.ms = 3000
fetch.max.bytes = 1500

✔ Message A = 800 bytes
✔ Message B = 900 bytes

Case:
- Broker receives Message A → waits for Message B
- A+B = 1700 bytes → exceeds fetch.max.bytes
- Kafka will only return Message A in this fetch cycle

🧠 Diagram - Fetch Behavior Matrix

+----------------------+--------------------------+----------------------+---------------------+ | Scenario | fetch.min.bytes | fetch.max.wait.ms | fetch.max.bytes | +----------------------+--------------------------+----------------------+---------------------+ | Total data = 400 | < 1000 → wait | Waits up to 3s | Not exceeded | | Total data = 1200 | >= min.bytes → ✅ send | Immediately | Not exceeded | | Message = 2000 | > max.bytes → ❌ skipped | Will not be fetched | Must raise config | | Total = 1600 | >= min.bytes | Time ok | Too large → partial | +----------------------+--------------------------+----------------------+---------------------+

🎯 Interview Questions & Answers

Q1. What does fetch.max.bytes do in Kafka consumer configuration?

It defines the maximum number of bytes a consumer can receive in a single fetch response. It is a hard cap across all partitions in a fetch cycle.

Q2. What happens if a record is larger than fetch.max.bytes?

Kafka will not return that record in the fetch response. The message is effectively "skipped" until fetch.max.bytes is increased to accommodate it.

Q3. Will Kafka throw an exception if a message is too large for fetch.max.bytes?

No. It silently skips the record. The consumer just keeps polling and receives nothing.

Q4. What if multiple messages together exceed fetch.max.bytes?

Kafka includes only as many messages as it can fit within the limit. The rest will be delivered in the next fetch call.

Q5. How does fetch.max.bytes differ from max.partition.fetch.bytes?

  • fetch.max.bytes – Total limit across all partitions
  • max.partition.fetch.bytes – Limit per partition
  • Both must be adjusted if your message size increases

📘 Bonus: Java Config Snippet

Properties props = new Properties();
props.put("fetch.min.bytes", "1000");
props.put("fetch.max.wait.ms", "3000");
props.put("fetch.max.bytes", "1500");

KafkaConsumer consumer = new KafkaConsumer<>(props);
ConsumerRecords records = consumer.poll(Duration.ofMillis(5000));

if (records.isEmpty()) {
    System.out.println("No records fetched — possibly due to fetch limits.");
}

💡 Interview Tips

  • Understand that these fetch configs optimize between latency and throughput.
  • Mention how fetch.max.bytes is critical for avoiding starvation with large messages.
  • Be ready to talk about how fetch.max.bytes and max.partition.fetch.bytes should be tuned together.

Monday, April 14, 2025

Kafka endOffset

endOffsets()

🔢 1. Understanding endOffsets()

When you call:

long endOffset = consumer.endOffsets(List.of(tp)).get(tp);

You get the next offset after the last available message in that partition.

Think of endOffset as a marker:
“If a new message arrives, this is the offset it will be assigned.”


🧠 Example:

Message Offset
A 0
B 1
C 2
  • endOffset = 3 (no message at offset 3 yet)

  • So to read last message, we must seek to endOffset - 1 = 2


🔍 Why Not Use endOffset Directly?

If you call:

consumer.seek(tp, endOffset); // BAD!

It will position the consumer after the last message, and poll() will return nothing, because there's nothing at that offset yet.


✅ Summary

Offset Meaning
currentOffset Where the next poll() will start
endOffset The next available offset, NOT the last existing record
endOffset - 1 Points to the actual last message in the topic/partition

🧠 Interview Line:

“Kafka offsets are forward-pointing. The endOffset indicates where the next message will be written, not where the last message lives. So to retrieve the last message, we seek to endOffset - 1.”


Let me know if you want this as a diagram or visual timeline — it's perfect for interview slides.

Saturday, April 12, 2025

Kafka Group Coordinator

Kafka Group Coordinator

Kafka Group Coordinator – Interview Perspective

Interview Question:
“What is the Group Coordinator in Kafka? How is it selected, what are its responsibilities, and where can we find its code in Kafka’s source?”

1. What is a Group Coordinator?

The Group Coordinator in Kafka is a designated Kafka broker responsible for managing a consumer group. It handles consumer group membership, rebalancing, and offset commits for the group.

2. How is the Group Coordinator Selected?

  • Every broker in Kafka is capable of being a group coordinator.
  • When a consumer joins a group, it sends a FindCoordinatorRequest to any broker.
  • That broker uses a consistent hashing strategy on the group.id to determine which broker will be the Group Coordinator for that group.
  • The broker holding the partition for the internal topic __consumer_offsets relevant to that group.id becomes the Group Coordinator.

3. Responsibilities of the Group Coordinator

The Group Coordinator performs several key functions:

  • Manage Consumer Membership: Tracks all consumers in a group.
  • Coordinate Rebalances: When consumers join/leave, it triggers a rebalance and assigns partitions.
  • Assign Leader: Appoints one consumer as the group leader to help coordinate partition assignment.
  • Track Offsets: Stores committed offsets in the __consumer_offsets topic.
  • Handle Heartbeats: Keeps track of active consumers to detect failures quickly.

4. Services Provided by the Group Coordinator

It acts as a centralized service point for:

  • Tracking active consumers in the group
  • Reassigning partitions during consumer join/leave events
  • Persisting committed offsets
  • Coordinating heartbeat checks and group timeouts

5. Internal Communication

  • Consumers communicate with the group coordinator using RPC-style requests:
    • JoinGroupRequest
    • SyncGroupRequest
    • HeartbeatRequest
    • LeaveGroupRequest
    • OffsetCommitRequest

6. Where Is This in Kafka’s Source Code?

The core logic for group coordination lives in the Kafka broker codebase:

  • Class: GroupCoordinator
  • Location: core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala
  • Handles all consumer group membership and offset commit logic.
  • Works closely with the __consumer_offsets internal topic.

Additional Code References:

  • kafka.server.KafkaApis – entry point that handles requests from consumers
  • FindCoordinatorRequest – initiates coordinator discovery
  • GroupMetadataManager – helps manage group metadata and offset storage

8. What If There Are Multiple Consumer Groups?

Kafka is designed to handle many consumer groups efficiently. Each consumer group is managed independently by potentially different group coordinators, depending on the group ID.

How Kafka Handles Multiple Groups:

  • Each group.id is mapped (via hashing) to a broker using the __consumer_offsets topic.
  • This ensures that different consumer groups can be coordinated by different brokers for load distribution.
  • Even if multiple groups consume from the same topic, they operate independently and can have different offsets and members.

Example Scenario:

  • Group A might be coordinated by Broker 1.
  • Group B might be coordinated by Broker 2.
  • Both could be consuming from the same topic orders, but maintain their own committed offsets.

Advantages of This Design:

  • Isolation: Each group is logically separated — no interference or data duplication.
  • Scalability: Multiple brokers help distribute coordination load.
  • Flexibility: You can have many groups with different configurations and processing logic.
Interview Summary:
“Kafka supports multiple consumer groups natively. Each group is handled independently, and group coordinators are selected using the group ID’s hash. This ensures that different groups are isolated in terms of membership and offset tracking, even when reading from the same topic. It's a powerful feature that enables multi-team, multi-application consumption from a single Kafka topic.”

7. Summary (Interview Style)

“The Group Coordinator is a Kafka broker responsible for managing a specific consumer group. It is selected via consistent hashing based on the group ID and coordinates group membership, partition assignments, and offset tracking. It plays a critical role in ensuring consumer groups are balanced and fault-tolerant. You can find the implementation in GroupCoordinator.scala within the Kafka broker code.”

Kafka Brokers vs Partitions

Kafka Brokers vs Partitions - Interview Perspective

Kafka Brokers vs Partitions

1. What is a Kafka Partition?

  • A Kafka partition is a unit of parallelism within a topic.
  • Each partition is an ordered, immutable sequence of records (like a log file).
  • Partitions allow Kafka to scale horizontally by distributing data across consumers.
  • Each partition has exactly one leader and zero or more replicas.

2. What is a Kafka Broker?

  • A Kafka broker is a Kafka server instance.
  • A Kafka cluster consists of multiple brokers (usually 3+ for production).
  • Each broker is responsible for storing and serving partitions.
  • One broker in the cluster also acts as the Group Coordinator for consumer group management.

3. Key Differences Between Broker and Partition

Kafka Broker Kafka Partition
A Kafka server (process running in the cluster) A data structure (log) inside a topic
Stores and manages partitions Contains the actual messages (data)
Communicates with producers and consumers Used for parallelism in writing/reading data
Can act as leader or follower for partitions Has a single leader, rest are replicas

4. How Are They Related?

Partitions are stored on brokers. Kafka distributes partitions across brokers for:

  • Scalability: More brokers allow more partitions and greater throughput.
  • Fault Tolerance: Replicas of partitions are kept on different brokers.
  • Load Balancing: Kafka tries to evenly spread partitions among brokers.

5. Common Misunderstanding (and Clarification)

It's common to think: "More partitions = More brokers." While more partitions may require more brokers for performance, it's not a strict 1:1 requirement.While Kafka partitions and brokers are related, multiple brokers do not exist simply because there are multiple partitions.They serve different roles in Kafka’s architecture.

You can have:

  • 1 broker with many partitions (not scalable or fault tolerant)
  • Multiple brokers with a few partitions (more scalable and resilient)

Conclusion (Interview Style):
Multiple brokers don't exist just because there are multiple partitions. Instead, Kafka allows topics to be split into partitions for horizontal scaling, and those partitions are distributed across brokers. Brokers are essential for scalability and fault tolerance, while partitions enable parallel processing. Together, they form Kafka's foundation for high throughput and distributed messaging.

Kafka Partition

🧩 What Exactly Is a Partition in Kafka? A partition is the fundamental unit of storage, parallelism, and scalability in Kafka. Think of ...