Reliability + System Design
Circuit breaker pattern in system design
A circuit breaker protects your service from repeatedly calling a failing dependency. It fails fast, gives the dependency time to recover, and helps prevent cascading failures.
The Short Answer
A circuit breaker protects your service from repeatedly calling a dependency that is already failing.
Instead of letting every request wait on a slow or broken service, the circuit breaker can temporarily stop calls and fail fast.
The Real Problem
Imagine an Order Service calling a Payment Service. The Payment Service becomes slow or unavailable.
Without Circuit Breaker
With Circuit Breaker
The circuit breaker does not fix the Payment Service. It protects the caller and gives the failing dependency breathing room.
Why Timeouts and Retries Are Not Enough
Timeouts and retries are useful, but they do not fully solve the problem.
If the dependency is only briefly flaky, retries may help. But if the dependency is consistently failing, retries can make the situation worse by sending even more traffic to something already overloaded.
Dependency starts failing
↓
Timeouts happen
↓
Retries increase traffic
↓
Dependency gets more overloaded
↓
More failures
↓
Caller starts failing tooCircuit breakers are defensive: “it is probably still broken, so do not keep calling it right now.”
The Three Circuit Breaker States
A circuit breaker is usually explained using three states:
Closed
Normal state. Requests are allowed through. The breaker watches failures.
Open
Failure threshold reached. Requests fail fast without calling the dependency.
Half-Open
After a wait period, allow a small number of trial requests to see if the dependency recovered.
This gives the system a controlled way to stop traffic, wait, test recovery, and then return to normal.
Mental Model: State Transitions
Closed
Calls pass through
Open
Calls fail fast
Half-Open
Trial calls allowed
Closed → Open: too many failures or timeouts.
Open → Half-Open: wait period expires.
Half-Open → Closed: trial calls succeed.
Half-Open → Open: trial calls fail.
Simple Pseudocode
A circuit breaker usually wraps a remote call.
if circuit is OPEN:
fail fast
try:
call dependency
record success
catch failure:
record failure
maybe open circuitThe important part is that in the open state, the service avoids even making the remote call.
Simple Java Example
This is a deliberately simplified example. Real systems usually use a library like resilience4j, but the basic idea is easier to understand in plain Java.
import java.time.Duration;
import java.time.Instant;
import java.util.function.Supplier;
public class SimpleCircuitBreaker {
enum State {
CLOSED,
OPEN,
HALF_OPEN
}
private State state = State.CLOSED;
private int failureCount = 0;
private final int failureThreshold = 3;
private final Duration openDuration = Duration.ofSeconds(5);
private Instant openedAt;
public String call(Supplier<String> dependencyCall) {
if (state == State.OPEN) {
if (Instant.now().isBefore(openedAt.plus(openDuration))) {
throw new RuntimeException("Circuit is open. Failing fast.");
}
state = State.HALF_OPEN;
}
try {
String result = dependencyCall.get();
recordSuccess();
return result;
} catch (RuntimeException ex) {
recordFailure();
throw ex;
}
}
private void recordSuccess() {
failureCount = 0;
state = State.CLOSED;
}
private void recordFailure() {
failureCount++;
if (failureCount >= failureThreshold) {
state = State.OPEN;
openedAt = Instant.now();
}
}
}This example tracks failures. Once enough failures happen, the breaker opens and future calls fail fast until the wait period expires.
What Happens When the Circuit Is Open?
When the circuit is open, your service has a few options:
Fail fast
Return an error immediately instead of waiting on a dependency that is probably failing.
Use fallback
Return cached data, default data, or reduced functionality if that is safe for the product.
Degrade gracefully
Disable non-critical features while preserving the main user flow.
Queue for later
For background work, store the task and retry asynchronously later.
Example: Graceful Degradation
Suppose the recommendation service is failing on an ecommerce site.
Product Page
↓
Recommendation Service fails
↓
Circuit breaker opens
↓
Hide recommendations
↓
Still allow product view and checkoutThat is much better than failing the entire product page just because recommendations are unavailable.
Important Configuration Choices
Circuit breakers are powerful, but the configuration matters.
Failure threshold
How many failures or what failure percentage should open the circuit?
Sliding window
Are you measuring failures over the last 10 requests, last 100 requests, or last 30 seconds?
Open duration
How long should the breaker stay open before testing recovery?
Half-open trial count
How many requests should be allowed through before deciding the dependency is healthy again?
Failure types
Should timeouts count? What about 500s, 429s, validation errors, or authentication failures?
Fallback behavior
What should the user see when the dependency is unavailable?
What Counts as a Failure?
Not every error should trip a circuit breaker.
Usually count
Timeouts, connection failures, dependency 5xx errors, and clear service-unavailable responses.
Usually do not count
Bad user input, validation errors, authentication errors, or authorization errors.
Circuit Breakers and Retries Together
Circuit breakers and retries are often used together, but they must be coordinated.
Safe pattern:
timeout per call
limited retries with backoff
circuit breaker around dependency
fallback or graceful degradationIf the circuit is open, do not keep retrying the same failing call. That defeats the purpose of the circuit breaker.
When Not to Use a Circuit Breaker
Circuit breakers are not always necessary.
- Simple local function calls
- Very low-volume internal tools
- Cases where a normal timeout is enough
- Operations where fail-fast behavior is worse than waiting
They are most useful around remote dependencies, especially when failure can spread from one service to another.
How to Answer This in an Interview
Common Interview Follow-Ups
Is a circuit breaker the same as a retry?
No. A retry tries the operation again because the failure may be temporary. A circuit breaker stops calling a dependency when it is likely to keep failing.
What are the three circuit breaker states?
Closed, Open, and Half-Open. Closed allows calls, Open fails fast, and Half-Open allows trial calls to check recovery.
Does a circuit breaker fix the downstream service?
No. It protects the caller and reduces pressure on the downstream service, but it does not directly repair the dependency.
What should happen when the circuit is open?
The service can fail fast, return cached/default data, gracefully degrade, or queue work for later depending on the use case.
What mistake do candidates make?
They memorize Closed/Open/Half-Open but do not explain the actual reason: preventing repeated slow calls from consuming resources and causing cascading failures.