System Design
What are common caching strategies in system design?
Caching improves latency and reduces load by storing frequently used data closer to the application or user. Common strategies include cache-aside, read-through, write-through, write-behind, write-around, TTL-based caching, and CDN caching.
The Short Answer
Caching stores frequently used data somewhere faster to access than the original source of truth.
The goal is usually to reduce latency, reduce database load, absorb traffic spikes, and improve user experience.
The Real Problem Caching Solves
Imagine a product page that gets thousands of requests per minute. Each request needs product details, pricing, inventory hints, review summaries, and recommendation data.
Without caching, every request may hit the database or multiple downstream services.
Without Cache
With Cache
But caching creates new questions: how fresh does the data need to be, what happens when data changes, and what happens if the cache is down?
Problem Context 1: Read-Heavy Product Details
Suppose product details are read constantly but updated relatively rarely.
This is a perfect fit for cache-aside, also called lazy loading. In cache-aside, the application checks the cache first; on a miss, it reads from the database, stores the result in cache, and returns it. This is one of the most common database caching strategies.
Product getProduct(String productId) {
Product cached = cache.get(productId);
if (cached != null) {
return cached;
}
Product product = database.findProduct(productId);
cache.set(productId, product, ttl);
return product;
}Why It Works Here
Main Tradeoff
Problem Context 2: Data Should Be Fresh After Writes
Suppose users update their profile, and the next read should usually see the updated data.
One option is write-through caching. The application writes to the database and immediately updates the cache.
void updateUserProfile(UserProfile profile) {
database.update(profile);
cache.set(profile.id(), profile, ttl);
}This works well when the cache is a shared distributed cache such as Redis because all application servers read from the same cache.
However, many systems introduce a second cache layer inside each application server:
App Server Local Memory Cache (L1)
↓
Redis Distributed Cache (L2)
↓
DatabaseLocal caches are extremely fast because they avoid a network call to Redis, but they introduce a new challenge: stale data.
Imagine Server A updates a user profile:
Server A:
database.update(...)
redis.set(...)
Server B:
still has old value in local memoryNow Server B may continue serving stale data until its local cache is refreshed or invalidated.
Why It Works Here
Main Tradeoff
Common solutions include:
- Short TTLs on local cache entries
- Redis Pub/Sub invalidation messages
- Kafka or event-driven cache invalidation
- Versioned cache keys
- Avoiding local caches for highly dynamic data
In larger distributed systems, the harder problem is keeping multiple local caches synchronized after the update.
Keeping Multiple Servers Consistent
Once a system grows beyond a single application server, cache consistency becomes more challenging.
Suppose we have:
Users
↓
Load Balancer
↓
Server A
Server B
Server C
↓
Redis
↓
DatabaseIf Server A updates a user profile, how do Servers B and C know their cached copy is now stale?
Several approaches are commonly used.
Option 1: Redis Only (No Local Cache)
Application servers do not keep local copies of cached data. Every cache lookup goes directly to Redis.
App Server
↓
Redis
↓
DatabaseWhen Server A updates Redis, all other servers immediately see the updated value because everyone reads from the same shared cache.
Why it works- Simple architecture
- No cache synchronization problems
- Fresh data visible immediately
- Every cache hit requires a network call
- Redis latency becomes part of every request
- Redis availability becomes critical
Option 2: Local Cache + Redis
Each application server maintains a small in-memory cache in addition to Redis.
Local Cache (L1)
↓
Redis (L2)
↓
DatabaseRequests first check local memory. Only if the data is missing do they query Redis.
Why it works- Extremely fast reads
- Reduced Redis traffic
- Lower latency for hot data
- Servers can hold different versions of data
- Stale reads become possible
- Additional invalidation mechanisms required
Option 3: Local Cache + Short TTL
Instead of synchronizing caches immediately, each server accepts a small amount of staleness.
User Profile
TTL = 30 secondsIf Server B has stale data, it naturally expires after a short period and is refreshed from Redis.
Why it works- Simple implementation
- No messaging infrastructure required
- Good enough for many systems
- Users may briefly see old data
- Updates are not immediately visible everywhere
- Choosing the right TTL can be difficult
Option 4: Local Cache + Pub/Sub Invalidation
When a server updates data, it also publishes an invalidation event.
Server A updates profile
↓
Publish invalidation event
↓
Servers B and C receive event
↓
Evict local cache entryRedis Pub/Sub, Kafka, or another message broker can be used to distribute invalidation events.
Why it works- Very fast local cache reads
- Near real-time cache consistency
- Scales well across many servers
- More moving parts
- More operational complexity
- Lost invalidation events can cause stale data
"If freshness matters, I would start with Redis as a shared cache. If latency becomes a concern, I would introduce local caches and use Pub/Sub invalidation or short TTLs to keep them synchronized."
Problem Context 3: Very High Write Volume
Suppose a system records lots of events, counters, or activity logs. Writing synchronously to the database on every change may be too expensive.
A write-behind strategy writes to cache first and persists to the database asynchronously later.
recordEvent(event) {
cache.increment(event.counterKey());
// async worker later flushes updates to database
}Why It Works Here
Main Tradeoff
Problem Context 4: Avoid Polluting the Cache
Suppose users upload large reports or rarely accessed documents. Writing every new object into cache may waste memory.
A write-around strategy writes directly to the database and does not immediately populate the cache.
void saveReport(Report report) {
database.save(report);
// do not cache immediately
// cache later only if users actually read it
}Why It Works Here
Main Tradeoff
Problem Context 5: Fast Content Near the User
Suppose you serve images, JavaScript bundles, CSS files, videos, or public article pages to users around the world.
A CDN cache (Content Delivery Network) stores content at edge locations closer to users. This reduces latency and reduces origin server load.
Why It Works Here
Main Tradeoff
Problem Context 6: Rapidly Changing Data
Suppose you cache comments, activity feeds, leaderboards, or inventory-like data that changes frequently.
One practical approach is TTL-based caching. Every cache key gets an expiration time, and the system accepts that data may be slightly stale for a short period.
cache.set(
"leaderboard:daily",
leaderboard,
Duration.ofSeconds(5)
);AWS recommends applying TTLs to cache keys in most cases, and notes that short TTLs can be a practical way to protect a hammered database query while evaluating a more elegant solution.
Why It Works Here
Main Tradeoff
Problem Context 7: Expensive Data That Must Stay Warm
Suppose a homepage, recommendation block, or pricing summary is very expensive to compute and gets requested often.
A refresh-ahead strategy refreshes cached data before it expires, so users are less likely to experience a slow cache miss.
Why It Works Here
Main Tradeoff
Common Cache Layers
In-Memory Local Cache
Distributed Cache
Database Query Cache
CDN / Edge Cache
The Hard Part: Cache Invalidation
Cache invalidation means removing or refreshing old data when the source of truth changes. Redis describes invalidation as removing old data from the cache so the system can avoid serving outdated data and improve cache usefulness.
Common invalidation approaches include:
- expire keys with TTL
- delete cache keys after database writes
- update cache immediately after writes
- publish events that tell services to evict keys
- version cache keys when data models change
Cache Stampede / Thundering Herd
A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.
Popular key expires
↓
1,000 requests miss cache
↓
1,000 database queries
↓
database spikeCommon protections include:
- request coalescing or single-flight loading
- locks around cache rebuilds
- jittered TTLs so many keys do not expire together
- refresh-ahead for very hot keys
- serving stale data briefly while refreshing in the background
Choosing the Right Strategy
| Strategy | Problem Context | Main Risk |
|---|---|---|
| Cache-aside | Read-heavy data loaded on demand | First miss is slower |
| Write-through | Data should be fresh after writes | Slower writes |
| Write-behind | High write volume | Data loss risk without durable async pipeline |
| Write-around | Avoid caching rarely read writes | First read after write is slower |
| Short TTL | Fast-changing data | Brief stale reads |
| Refresh-ahead | Hot expensive data | Unneeded refresh work |
The Interview-Friendly Explanation
Common Interview Follow-Ups
What is the most common caching strategy?
Cache-aside is one of the most common strategies. The application checks the cache first, reads from the database on a miss, stores the result in cache, and returns it.
Why is cache invalidation hard?
Because the cache is a copy of data. When the source of truth changes, every cached representation that depends on that data may need to be updated or removed.
What is a cache stampede?
A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.
Should every piece of data be cached?
No. Cache data that is expensive to fetch or compute and is read often enough to justify cache memory and invalidation complexity.
What happens if Redis is down?
Usually the system should degrade gracefully. Depending on the product, it may bypass cache and hit the database, serve stale data, shed load, or return an error for noncritical features.