İçeriğe Atla
Mustafa Erbay
Technology · 12 min read · görüntülenme Türkçe oku
100%

Cache Stampede (Thundering Herd) and Operational Defenses

A guide to taming the stampede (thundering herd) risk that can crush a backend after TTL expiry or a cache flush — using jitter, singleflight, and stale…

Cache Stampede (Thundering Herd) and Operational Defenses — cover image

A correctly built cache lowers cost, cuts latency, and keeps the system calm. A wrongly built one does the opposite: a TTL expires, one key “goes cold,” thousands of requests rush the backend at once, and the cache kills the backend before the backend dies on its own.

This post tackles stampede (thundering herd) not as theory, but through control mechanisms you can actually apply in production.

1) What does a stampede look like?

Typical signals:

  • Cache hit ratio collapses suddenly
  • Backend QPS / DB connection counts spike
  • p95/p99 latency rises, then a timeout wave follows
  • Error codes: 5xx + “upstream timeout” + “connection pool exhausted”

The most dangerous scenario: a single hot key (homepage payload, campaign price, auth policy) requested simultaneously by thousands of clients the moment its TTL expires.

2) Root cause: synchronized TTLs and concurrent refresh

A stampede usually starts from this combination:

  • Every node uses the same TTL (synchronized expiry)
  • On a cache miss every caller hits the backend (no coalescing)
  • The backend has no protection (no rate limit / circuit breaker)

3) First defense: TTL jitter (break the synchronization)

Don’t keep TTLs constant. Apply a small jitter even within the same “product class”:

  • For example, instead of a fixed 300 seconds, randomize between 240 and 360
  • A deterministic jitter per key (via hashing) also works

But it isn’t enough by itself: jitter spreads the wave; it doesn’t deduplicate the refresh.

4) Second defense: request coalescing (singleflight)

When a cache miss happens, attach all concurrent requests for the same key to a single “refresh job”:

  • The first caller goes to the backend
  • The rest wait for the same result (or take the previous value)

In application code this is typically implemented as singleflight / memoize / a “promise cache.”

Operational note: put an upper bound on the coalescing “wait.” Don’t lock all requests if the backend itself is broken.

5) Third defense: stale-while-revalidate (serve stale)

For critical keys, returning “acceptably old” instead of “perfectly fresh” can end the incident.

The model:

  • When the TTL expires, return the previous value during a short “stale window”
  • Kick off the refresh in the background
  • If the refresh fails, keep using the old value for a bounded extension

The practical outcome: the backend rides through short bursts “without collapsing.”

6) Fourth defense: backend protections (limit and shed)

To protect the backend mid-stampede:

  • Rate limit: per key or per endpoint
  • Bulkhead: isolate cache-miss traffic in a separate pool/queue
  • Circuit breaker: fail fast when errors/timeouts climb
  • Load shedding: drop low-priority requests

The aim here isn’t “answer every request”; it is keep the system standing.

7) Runbook: what do I do once the stampede starts?

A practical 15–30 minute order in the field:

  1. Which key/endpoint is exploding? (top keys / top routes)
  2. Cache layer: hit ratio, evictions, TTL behavior
  3. Backend: pool saturation, timeouts, error rate
  4. Quick intervention:
    • Turn on serve-stale (if available)
    • Apply a rate limit on miss traffic
    • Temporarily extend the TTL (or warm the cache)
    • Precompute / prewarm the most critical keys
  5. Permanent fix:
    • Push coalescing + jitter + stale into the code path
    • Add an operations test: “resilience after a cache flush”

8) Test: a small but very valuable “cache flush chaos”

The practical test I like:

  • In a canary environment, purge a defined key set
  • Simultaneously generate N requests (in stages)
  • Measure how the backend behaves

This test moves your cache strategy from “fast on a good day” to “safe on a bad day.”

9) Closing thought

Cache stampedes are usually more expensive than the “no cache at all” problem; the cache appears to exist, but it doesn’t protect the system in a crisis. When you apply TTL jitter, coalescing, and stale strategies together, the cache turns from a performance tool into an operational seat belt.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts