İçeriğe Atla
Mustafa Erbay
Technology · 4 min read · görüntülenme Türkçe oku
100%

DDoS Response Runbook with BGP RTBH and FlowSpec

A controlled approach to reducing DDoS impact during operations using an RTBH/FlowSpec decision tree, verification steps, and a rollback plan.

DDoS Response Runbook with BGP RTBH and FlowSpec — cover image

The biggest mistake in DDoS events is treating the technical fix as “a single move.” In reality, a good response is the combined work of a decision tree, verification, and rollback steps. RTBH (Remote Triggered Black Hole) and BGP FlowSpec, designed correctly, can hit very fast during the event; designed incorrectly, they cut the wrong traffic and produce a second incident.

Rather than “the big ISP-level story,” this post focuses on a practical runbook approach applied in the field at the corporate network and edge layer.

Prerequisite: topology and ownership boundaries must be clear

Before talking about RTBH and FlowSpec, the answers to three questions must be on paper:

  1. Who is speaking BGP? (Edge router, transit/peering device, or cloud router?)
  2. Where does the black hole go? (Null route inside the device, scrubbing center, or upstream blackhole?)
  3. Who decides? (NOC, NetOps, SecOps, Incident Commander?)

If these answers are not crisp, “who pushes the command?” turns into a debate during the event and burns time.

Decision tree: RTBH or FlowSpec?

The most practical split in the field:

  • If the target service is fully sinking and you have reached the “shut it down is better” point → RTBH
  • If part of the traffic is bad and can be filtered out → FlowSpec

Quick metrics to inform that call:

SignalFavors RTBHFavors FlowSpec
L7 fully collapsedYesNo
Attack on a single target IPYesYes
Attack on specific port/protoPartlyYes
False-positive riskLowMedium/High
Application tolerance”Shut it” acceptable”Stay up” target

RTBH: minimum safe usage template

The point of RTBH is to advertise a route specific to the target prefix with a “blackhole next-hop” so the traffic gets dropped upstream. I recommend three controls:

  1. Trigger only on a specific community
  2. Accept only specific prefix sizes (narrow targets like /32)
  3. Set a TTL (duration) as an operational rollback standard

Verification steps

Verification after RTBH is not just “traffic dropped”:

  • Verify on the edge router that the relevant prefix points to the blackhole
  • Verify on the upstream/IX side that the route propagated (looking glass, if available)
  • Measure that CPU/conntrack/interrupt pressure on the target service has actually dropped
  • Make the alarm storm in monitoring “expected” (label, do not silence)

FlowSpec: surgical filtering, surgical risk

FlowSpec is very powerful because you can write filters by fields like “port/proto/flags.” But the risk is this: a wrong rule cuts production traffic too.

Two safe usage patterns I rely on in the field:

  • Rate-limit (slow down instead of drop)
  • Only a narrow match (single target IP + single port + short duration)

Verification steps

After applying FlowSpec, watch the following two metrics together:

  • Service metrics: error rate, latency, saturation
  • Network metrics: PPS/BPS drop, drop counters, policer counters

If only the network metric drops while the service metric does not recover, you are intervening at the wrong place (e.g. an L7 attack, application layer).

Operational runbook: step by step

During the event, “who does what” must be short and clear:

  1. Triage (5 min): attack vector, target(s), impact (SLO), decision (RTBH/FlowSpec/other)
  2. Change record (2 min): who, when, which rule/prefix, target duration
  3. Apply (1–3 min): push the rule/prefix
  4. Verify (5 min): service + network metrics
  5. Rollback (planned): remove when the duration is up; collect evidence for the postmortem

Postmortem: a real improvement list after a DDoS

Even when RTBH/FlowSpec succeed, the to-do list after the event is very clear:

  • Edge capacity: PPS/BPS, conntrack, interrupt tuning
  • Application resilience: caching, queue, circuit breaker
  • Observability: netflow/sflow, WAF logs, upstream telemetry
  • Process: rule templates, on-call authority matrix, drills

Conclusion

Designed correctly, RTBH and FlowSpec save time during DDoS events; designed incorrectly, they hurt production traffic. That is why the decision tree and the rollback standard must be part of the runbook, just as much as the technical commands. On the operational leadership side, the biggest win is making “under which condition do we activate which tool?” a decision made in advance, not a question asked in panic.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts