A correctly built cache lowers cost, cuts latency, and keeps the system calm. A wrongly built one does the opposite: a TTL expires, one key “goes cold,” thousands of requests rush the backend at once, and the cache kills the backend before the backend dies on its own.
This post tackles stampede (thundering herd) not as theory, but through control mechanisms you can actually apply in production.
1) What does a stampede look like?
Typical signals:
- Cache hit ratio collapses suddenly
- Backend QPS / DB connection counts spike
- p95/p99 latency rises, then a timeout wave follows
- Error codes: 5xx + “upstream timeout” + “connection pool exhausted”
The most dangerous scenario: a single hot key (homepage payload, campaign price, auth policy) requested simultaneously by thousands of clients the moment its TTL expires.
2) Root cause: synchronized TTLs and concurrent refresh
A stampede usually starts from this combination:
- Every node uses the same TTL (synchronized expiry)
- On a cache miss every caller hits the backend (no coalescing)
- The backend has no protection (no rate limit / circuit breaker)
3) First defense: TTL jitter (break the synchronization)
Don’t keep TTLs constant. Apply a small jitter even within the same “product class”:
- For example, instead of a fixed 300 seconds, randomize between 240 and 360
- A deterministic jitter per key (via hashing) also works
But it isn’t enough by itself: jitter spreads the wave; it doesn’t deduplicate the refresh.
4) Second defense: request coalescing (singleflight)
When a cache miss happens, attach all concurrent requests for the same key to a single “refresh job”:
- The first caller goes to the backend
- The rest wait for the same result (or take the previous value)
In application code this is typically implemented as singleflight / memoize / a “promise cache.”
Operational note: put an upper bound on the coalescing “wait.” Don’t lock all requests if the backend itself is broken.
5) Third defense: stale-while-revalidate (serve stale)
For critical keys, returning “acceptably old” instead of “perfectly fresh” can end the incident.
The model:
- When the TTL expires, return the previous value during a short “stale window”
- Kick off the refresh in the background
- If the refresh fails, keep using the old value for a bounded extension
The practical outcome: the backend rides through short bursts “without collapsing.”
6) Fourth defense: backend protections (limit and shed)
To protect the backend mid-stampede:
- Rate limit: per key or per endpoint
- Bulkhead: isolate cache-miss traffic in a separate pool/queue
- Circuit breaker: fail fast when errors/timeouts climb
- Load shedding: drop low-priority requests
The aim here isn’t “answer every request”; it is keep the system standing.
7) Runbook: what do I do once the stampede starts?
A practical 15–30 minute order in the field:
- Which key/endpoint is exploding? (top keys / top routes)
- Cache layer: hit ratio, evictions, TTL behavior
- Backend: pool saturation, timeouts, error rate
- Quick intervention:
- Turn on serve-stale (if available)
- Apply a rate limit on miss traffic
- Temporarily extend the TTL (or warm the cache)
- Precompute / prewarm the most critical keys
- Permanent fix:
- Push coalescing + jitter + stale into the code path
- Add an operations test: “resilience after a cache flush”
8) Test: a small but very valuable “cache flush chaos”
The practical test I like:
- In a canary environment, purge a defined key set
- Simultaneously generate N requests (in stages)
- Measure how the backend behaves
This test moves your cache strategy from “fast on a good day” to “safe on a bad day.”
9) Closing thought
Cache stampedes are usually more expensive than the “no cache at all” problem; the cache appears to exist, but it doesn’t protect the system in a crisis. When you apply TTL jitter, coalescing, and stale strategies together, the cache turns from a performance tool into an operational seat belt.