System Design

Why Use Kafka?

Understand when Kafka and similar messaging systems become useful: moving from direct service calls to durable event-driven communication.

KafkaEvent Driven ArchitectureMessagingSystem Design

The Short Version

Kafka becomes useful when one business event needs to trigger many downstream actions, and direct service-to-service calls start making the system slow, fragile, and tightly coupled.

Before Kafka: A Small Simple System

Imagine a small online store. At first, the checkout service does everything directly.

Initial checkout flow

User places order
Checkout service saves order
Checkout service charges payment
Checkout service sends confirmation email

This is simple and completely reasonable when the system is small. There may be only one backend service, one database, and a small amount of traffic.

Kafka is not automatically needed just because a system exists. A small synchronous system is often the correct starting point.

The System Starts Growing

Over time, more features get added around the same order event.

More things now happen after an order

Charge payment
Send email
Update inventory
Notify warehouse
Update loyalty points
Send analytics event
Trigger fraud review
Notify shipping service

The checkout service now has to know about many other systems. Every new feature adds another dependency to checkout.

The Problem With Direct Calls

A direct synchronous design may start looking like this:

Synchronous checkout

Checkout Service

↓

Payment Service

Email Service

Inventory Service

Warehouse Service

Analytics Service

Fraud Service

This creates several problems:

The checkout request becomes slower as more calls are added.
If one downstream service is slow, checkout may become slow.
If one downstream service is down, checkout logic gets messy.
The checkout service becomes tightly coupled to many systems.
Adding a new consumer requires changing checkout again.

The Kafka Version

With Kafka, checkout can publish an event instead of directly calling every downstream system.

One Event, Many Consumers

Checkout Service

↓

Kafka Topic: order-created

↓

Inventory Consumer

Email Consumer

Warehouse Consumer

Analytics Consumer

Fraud Consumer

Loyalty Consumer

Now checkout only needs to say:

“An order was created.”

Other services that subscribe to the order-created event are notified when an order is created. They can then independently decide what they need to do with that event.

Why This Helps

Loose Coupling

The producer does not need to know every service that reacts to the event.

Better Responsiveness

Checkout can finish faster instead of waiting for every downstream action.

Independent Consumers

Email, inventory, analytics, and warehouse logic can evolve separately.

Replay

A consumer can reprocess old events if it needs to rebuild state or recover from a bug.

Scalability

High-volume event streams can be partitioned and processed by multiple consumers.

Durability

Events are stored in Kafka for a retention period instead of disappearing immediately.

When Kafka Is Probably Overkill

Kafka is powerful, but it is not automatically the right answer.

The system is small and synchronous calls are simple.
You only have one producer and one consumer.
You do not need replay or durable event history.
Kafka introduces operational and development complexity that may outweigh its benefits for a small system.
A simple queue or direct call would solve the problem.

A strong interview answer does not say “always use Kafka.” It says when Kafka helps and when it is unnecessary complexity.

What Operational Complexity Means

Kafka is powerful, but it adds complexity that a small system may not need.

You must run and monitor Kafka brokers.
You must manage topics, partitions, retention, and disk usage.
You must track consumer lag to know if consumers are falling behind.
Consumers may process messages more than once, so idempotency matters, complicating the code.
Failures become asynchronous and harder to debug than a direct API call.
Developers must understand offsets, retries, dead letter queues, ordering, and consumer groups.

For a small system, a REST call is often easier to build, debug, monitor, and operate. Kafka becomes worth it when decoupling, replay, durability, and scalability outweigh that added complexity.

What Interviewers Are Looking For

You Understand the Why

Kafka is not just an API. It solves coupling, fan-out, replay, buffering, and high-volume event processing problems.

You Know Event-Driven Design

Services publish facts about things that happened, and other services react independently.

You Know the Tradeoff

Kafka improves decoupling but introduces async behavior, eventual consistency, and operational complexity.

You Can Explain Growth

You can start with a simple system and explain the point where direct calls become painful.

Final Takeaway

Kafka usually proves useful when a system grows from “do one thing directly” to “many independent systems need to react to the same event.”