Cache Stampede (Thundering Herd) and Operational Defenses

A correctly built cache lowers cost, cuts latency, and keeps the system calm. A wrongly built one does the opposite: a TTL expires, one key “goes cold,” thousands of requests rush the backend at once, and the cache kills the backend before the backend dies on its own.

This post tackles stampede (thundering herd) not as theory, but through control mechanisms you can actually apply in production.

1) What does a stampede look like?

Typical signals:

Cache hit ratio collapses suddenly
Backend QPS / DB connection counts spike
p95/p99 latency rises, then a timeout wave follows
Error codes: 5xx + “upstream timeout” + “connection pool exhausted”

The most dangerous scenario: a single hot key (homepage payload, campaign price, auth policy) requested simultaneously by thousands of clients the moment its TTL expires.

2) Root cause: synchronized TTLs and concurrent refresh

A stampede usually starts from this combination:

Every node uses the same TTL (synchronized expiry)
On a cache miss every caller hits the backend (no coalescing)
The backend has no protection (no rate limit / circuit breaker)

3) First defense: TTL jitter (break the synchronization)

Don’t keep TTLs constant. Apply a small jitter even within the same “product class”:

For example, instead of a fixed 300 seconds, randomize between 240 and 360
A deterministic jitter per key (via hashing) also works

But it isn’t enough by itself: jitter spreads the wave; it doesn’t deduplicate the refresh.

4) Second defense: request coalescing (singleflight)

When a cache miss happens, attach all concurrent requests for the same key to a single “refresh job”:

The first caller goes to the backend
The rest wait for the same result (or take the previous value)

In application code this is typically implemented as singleflight / memoize / a “promise cache.”

Operational note: put an upper bound on the coalescing “wait.” Don’t lock all requests if the backend itself is broken.

5) Third defense: stale-while-revalidate (serve stale)

For critical keys, returning “acceptably old” instead of “perfectly fresh” can end the incident.

The model:

When the TTL expires, return the previous value during a short “stale window”
Kick off the refresh in the background
If the refresh fails, keep using the old value for a bounded extension

The practical outcome: the backend rides through short bursts “without collapsing.”

6) Fourth defense: backend protections (limit and shed)

To protect the backend mid-stampede:

Rate limit: per key or per endpoint
Bulkhead: isolate cache-miss traffic in a separate pool/queue
Circuit breaker: fail fast when errors/timeouts climb
Load shedding: drop low-priority requests

The aim here isn’t “answer every request”; it is keep the system standing.

7) Runbook: what do I do once the stampede starts?

A practical 15–30 minute order in the field:

Which key/endpoint is exploding? (top keys / top routes)
Cache layer: hit ratio, evictions, TTL behavior
Backend: pool saturation, timeouts, error rate
Quick intervention:
- Turn on serve-stale (if available)
- Apply a rate limit on miss traffic
- Temporarily extend the TTL (or warm the cache)
- Precompute / prewarm the most critical keys
Permanent fix:
- Push coalescing + jitter + stale into the code path
- Add an operations test: “resilience after a cache flush”

8) Test: a small but very valuable “cache flush chaos”

The practical test I like:

In a canary environment, purge a defined key set
Simultaneously generate N requests (in stages)
Measure how the backend behaves

This test moves your cache strategy from “fast on a good day” to “safe on a bad day.”

9) Closing thought

Cache stampedes are usually more expensive than the “no cache at all” problem; the cache appears to exist, but it doesn’t protect the system in a crisis. When you apply TTL jitter, coalescing, and stale strategies together, the cache turns from a performance tool into an operational seat belt.

Cache Stampede (Thundering Herd) and Operational Defenses

1) What does a stampede look like?

2) Root cause: synchronized TTLs and concurrent refresh

3) First defense: TTL jitter (break the synchronization)

4) Second defense: request coalescing (singleflight)

5) Third defense: stale-while-revalidate (serve stale)

6) Fourth defense: backend protections (limit and shed)

7) Runbook: what do I do once the stampede starts?

8) Test: a small but very valuable “cache flush chaos”

9) Closing thought

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Retry Storms: Timeout Budget and Latency Amplification

SLO-Based Degrade Modes and Load Shedding

Feature Flags and Configuration Governance: Parameter Store and Audit

1) What does a stampede look like?

2) Root cause: synchronized TTLs and concurrent refresh

3) First defense: TTL jitter (break the synchronization)

4) Second defense: request coalescing (singleflight)

5) Third defense: stale-while-revalidate (serve stale)

6) Fourth defense: backend protections (limit and shed)

7) Runbook: what do I do once the stampede starts?

8) Test: a small but very valuable “cache flush chaos”

9) Closing thought

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Retry Storms: Timeout Budget and Latency Amplification

SLO-Based Degrade Modes and Load Shedding

Feature Flags and Configuration Governance: Parameter Store and Audit

Klavye Kısayolları