System Design

What are common caching strategies in system design?

Caching improves latency and reduces load by storing frequently used data closer to the application or user. Common strategies include cache-aside, read-through, write-through, write-behind, write-around, TTL-based caching, and CDN caching.

System DesignCachingRedisScalabilityPerformanceDistributed Systems

The Short Answer

Caching stores frequently used data somewhere faster to access than the original source of truth.

The goal is usually to reduce latency, reduce database load, absorb traffic spikes, and improve user experience.

The hard part is not putting data in a cache. The hard part is deciding when the cache should be read, updated, expired, refreshed, or ignored.

The Real Problem Caching Solves

Imagine a product page that gets thousands of requests per minute. Each request needs product details, pricing, inventory hints, review summaries, and recommendation data.

Without caching, every request may hit the database or multiple downstream services.

Without Cache

User request

App server

Database hit every time

With Cache

User request

Cache hit

Database avoided

But caching creates new questions: how fresh does the data need to be, what happens when data changes, and what happens if the cache is down?

Problem Context 1: Read-Heavy Product Details

Suppose product details are read constantly but updated relatively rarely.

This is a perfect fit for cache-aside, also called lazy loading. In cache-aside, the application checks the cache first; on a miss, it reads from the database, stores the result in cache, and returns it. This is one of the most common database caching strategies.

java

Product getProduct(String productId) {
    Product cached = cache.get(productId);

    if (cached != null) {
        return cached;
    }

    Product product = database.findProduct(productId);
    cache.set(productId, product, ttl);

    return product;
}

Why It Works Here

Only popular products enter the cache. Rarely viewed products do not waste cache memory.

Main Tradeoff

The first request after a miss is slower because it still has to hit the database.

Cache-aside is often the first strategy to mention in interviews because it is simple, flexible, and common for read-heavy systems.

Problem Context 2: Data Should Be Fresh After Writes

Suppose users update their profile, and the next read should usually see the updated data.

One option is write-through caching. The application writes to the database and immediately updates the cache.

java

void updateUserProfile(UserProfile profile) {
        database.update(profile);
        cache.set(profile.id(), profile, ttl);
    }

This works well when the cache is a shared distributed cache such as Redis because all application servers read from the same cache.

User Request

App Server A

Redis Cache

Database

However, many systems introduce a second cache layer inside each application server:

text

App Server Local Memory Cache (L1)

            ↓

    Redis Distributed Cache (L2)

            ↓

    Database

Local caches are extremely fast because they avoid a network call to Redis, but they introduce a new challenge: stale data.

Imagine Server A updates a user profile:

java

Server A:
        database.update(...)
        redis.set(...)

    Server B:
        still has old value in local memory

Now Server B may continue serving stale data until its local cache is refreshed or invalidated.

Why It Works Here

Reads after updates are likely to see fresh data because the cache is updated immediately after the database write.

Main Tradeoff

Writes become slower and local caches may require additional invalidation mechanisms to avoid stale data.

Common solutions include:

Short TTLs on local cache entries
Redis Pub/Sub invalidation messages
Kafka or event-driven cache invalidation
Versioned cache keys
Avoiding local caches for highly dynamic data

In simple systems, write-through usually means updating a shared cache such as Redis after updating the database.

In larger distributed systems, the harder problem is keeping multiple local caches synchronized after the update.

Keeping Multiple Servers Consistent

Once a system grows beyond a single application server, cache consistency becomes more challenging.

Suppose we have:

text

Users
  ↓
Load Balancer
  ↓
Server A
Server B
Server C
  ↓
Redis
  ↓
Database

If Server A updates a user profile, how do Servers B and C know their cached copy is now stale?

Several approaches are commonly used.

Option 1: Redis Only (No Local Cache)

Application servers do not keep local copies of cached data. Every cache lookup goes directly to Redis.

text

App Server
     ↓
Redis
     ↓
Database

When Server A updates Redis, all other servers immediately see the updated value because everyone reads from the same shared cache.

Why it works

Simple architecture
No cache synchronization problems
Fresh data visible immediately

Tradeoffs

Every cache hit requires a network call
Redis latency becomes part of every request
Redis availability becomes critical

Option 2: Local Cache + Redis

Each application server maintains a small in-memory cache in addition to Redis.

text

Local Cache (L1)
       ↓
Redis (L2)
       ↓
Database

Requests first check local memory. Only if the data is missing do they query Redis.

Why it works

Extremely fast reads
Reduced Redis traffic
Lower latency for hot data

Tradeoffs

Servers can hold different versions of data
Stale reads become possible
Additional invalidation mechanisms required

Option 3: Local Cache + Short TTL

Instead of synchronizing caches immediately, each server accepts a small amount of staleness.

text

User Profile
TTL = 30 seconds

If Server B has stale data, it naturally expires after a short period and is refreshed from Redis.

Why it works

Simple implementation
No messaging infrastructure required
Good enough for many systems

Tradeoffs

Users may briefly see old data
Updates are not immediately visible everywhere
Choosing the right TTL can be difficult

Option 4: Local Cache + Pub/Sub Invalidation

When a server updates data, it also publishes an invalidation event.

text

Server A updates profile
          ↓
Publish invalidation event
          ↓
Servers B and C receive event
          ↓
Evict local cache entry

Redis Pub/Sub, Kafka, or another message broker can be used to distribute invalidation events.

Why it works

Very fast local cache reads
Near real-time cache consistency
Scales well across many servers

Tradeoffs

More moving parts
More operational complexity
Lost invalidation events can cause stale data

In interviews, a strong answer is:

"If freshness matters, I would start with Redis as a shared cache. If latency becomes a concern, I would introduce local caches and use Pub/Sub invalidation or short TTLs to keep them synchronized."

Problem Context 3: Very High Write Volume

Suppose a system records lots of events, counters, or activity logs. Writing synchronously to the database on every change may be too expensive.

A write-behind strategy writes to cache first and persists to the database asynchronously later.

java

recordEvent(event) {
    cache.increment(event.counterKey());

    // async worker later flushes updates to database
}

Why It Works Here

Writes feel fast because the request does not wait for the database every time.

Main Tradeoff

If the cache or async pipeline fails before flushing, data may be delayed or lost unless the pipeline is durable.

Write-behind can improve write performance, but it needs careful durability and failure handling.

Problem Context 4: Avoid Polluting the Cache

Suppose users upload large reports or rarely accessed documents. Writing every new object into cache may waste memory.

A write-around strategy writes directly to the database and does not immediately populate the cache.

java

void saveReport(Report report) {
    database.save(report);

    // do not cache immediately
    // cache later only if users actually read it
}

Why It Works Here

Rarely read data does not fill the cache.

Main Tradeoff

The first read after write may be slower because the cache does not have the data yet.

Write-around is useful when many writes are unlikely to be read soon.

Problem Context 5: Fast Content Near the User

Suppose you serve images, JavaScript bundles, CSS files, videos, or public article pages to users around the world.

A CDN cache (Content Delivery Network) stores content at edge locations closer to users. This reduces latency and reduces origin server load.

User

Nearby CDN edge

Origin server

Database / storage

Why It Works Here

Static or mostly static content can be served from a nearby edge cache.

Main Tradeoff

Cache invalidation and stale content become important when content changes.

Problem Context 6: Rapidly Changing Data

Suppose you cache comments, activity feeds, leaderboards, or inventory-like data that changes frequently.

One practical approach is TTL-based caching. Every cache key gets an expiration time, and the system accepts that data may be slightly stale for a short period.

java

cache.set(
    "leaderboard:daily",
    leaderboard,
    Duration.ofSeconds(5)
);

AWS recommends applying TTLs to cache keys in most cases, and notes that short TTLs can be a practical way to protect a hammered database query while evaluating a more elegant solution.

Why It Works Here

A short TTL reduces database load while limiting how stale the data can become.

Main Tradeoff

Users may briefly see old data.

Problem Context 7: Expensive Data That Must Stay Warm

Suppose a homepage, recommendation block, or pricing summary is very expensive to compute and gets requested often.

A refresh-ahead strategy refreshes cached data before it expires, so users are less likely to experience a slow cache miss.

Cache entry exists

Near expiration

Refresh in background

User sees warm cache

Why It Works Here

Users avoid slow misses for very hot or expensive data.

Main Tradeoff

The system may refresh data that nobody ends up requesting.

Common Cache Layers

In-Memory Local Cache

Fastest access, but each application instance has its own copy. Good for small reference data.

Distributed Cache

Redis or Memcached-style cache shared by many application servers. Good for cross-instance coordination.

Database Query Cache

Stores expensive query results, but invalidation can become tricky when underlying rows change.

CDN / Edge Cache

Stores public content near users. Great for static assets and cacheable pages.

The Hard Part: Cache Invalidation

Cache invalidation means removing or refreshing old data when the source of truth changes. Redis describes invalidation as removing old data from the cache so the system can avoid serving outdated data and improve cache usefulness.

Common invalidation approaches include:

expire keys with TTL
delete cache keys after database writes
update cache immediately after writes
publish events that tell services to evict keys
version cache keys when data models change

A cache can make a system faster, but a stale cache can make the system confusing or incorrect.

Cache Stampede / Thundering Herd

A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.

text

Popular key expires
        ↓
1,000 requests miss cache
        ↓
1,000 database queries
        ↓
database spike

Common protections include:

request coalescing or single-flight loading
locks around cache rebuilds
jittered TTLs so many keys do not expire together
refresh-ahead for very hot keys
serving stale data briefly while refreshing in the background

Choosing the Right Strategy

Strategy	Problem Context	Main Risk
Cache-aside	Read-heavy data loaded on demand	First miss is slower
Write-through	Data should be fresh after writes	Slower writes
Write-behind	High write volume	Data loss risk without durable async pipeline
Write-around	Avoid caching rarely read writes	First read after write is slower
Short TTL	Fast-changing data	Brief stale reads
Refresh-ahead	Hot expensive data	Unneeded refresh work

The Interview-Friendly Explanation

Caching improves latency and reduces load, but each strategy has a consistency tradeoff. Cache-aside loads data on demand and is common for read-heavy systems. Write-through keeps cache fresher but slows writes. Write-behind improves write latency but needs durable failure handling. Write-around avoids cache pollution. TTLs and invalidation control staleness. For distributed systems, also discuss Redis, cache stampedes, stale data, and what happens if the cache fails.

Common Interview Follow-Ups

What is the most common caching strategy?

Cache-aside is one of the most common strategies. The application checks the cache first, reads from the database on a miss, stores the result in cache, and returns it.

Why is cache invalidation hard?

Because the cache is a copy of data. When the source of truth changes, every cached representation that depends on that data may need to be updated or removed.

What is a cache stampede?

A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.

Should every piece of data be cached?

No. Cache data that is expensive to fetch or compute and is read often enough to justify cache memory and invalidation complexity.

What happens if Redis is down?

Usually the system should degrade gracefully. Depending on the product, it may bypass cache and hit the database, serve stale data, shed load, or return an error for noncritical features.

Final Takeaway

Caching is not one pattern. It is a set of tradeoffs. A good system design answer explains what data is being cached, why it is safe to cache, how it is invalidated, how stale it can be, and what happens when the cache fails.