DDoS Response Runbook with BGP RTBH and FlowSpec

The biggest mistake in DDoS events is treating the technical fix as “a single move.” In reality, a good response is the combined work of a decision tree, verification, and rollback steps. RTBH (Remote Triggered Black Hole) and BGP FlowSpec, designed correctly, can hit very fast during the event; designed incorrectly, they cut the wrong traffic and produce a second incident.

Rather than “the big ISP-level story,” this post focuses on a practical runbook approach applied in the field at the corporate network and edge layer.

Prerequisite: topology and ownership boundaries must be clear

Before talking about RTBH and FlowSpec, the answers to three questions must be on paper:

Who is speaking BGP? (Edge router, transit/peering device, or cloud router?)
Where does the black hole go? (Null route inside the device, scrubbing center, or upstream blackhole?)
Who decides? (NOC, NetOps, SecOps, Incident Commander?)

If these answers are not crisp, “who pushes the command?” turns into a debate during the event and burns time.

Decision tree: RTBH or FlowSpec?

The most practical split in the field:

If the target service is fully sinking and you have reached the “shut it down is better” point → RTBH
If part of the traffic is bad and can be filtered out → FlowSpec

Quick metrics to inform that call:

Signal	Favors RTBH	Favors FlowSpec
L7 fully collapsed	Yes	No
Attack on a single target IP	Yes	Yes
Attack on specific port/proto	Partly	Yes
False-positive risk	Low	Medium/High
Application tolerance	”Shut it” acceptable	”Stay up” target

RTBH: minimum safe usage template

The point of RTBH is to advertise a route specific to the target prefix with a “blackhole next-hop” so the traffic gets dropped upstream. I recommend three controls:

Trigger only on a specific community
Accept only specific prefix sizes (narrow targets like /32)
Set a TTL (duration) as an operational rollback standard

Verification steps

Verification after RTBH is not just “traffic dropped”:

Verify on the edge router that the relevant prefix points to the blackhole
Verify on the upstream/IX side that the route propagated (looking glass, if available)
Measure that CPU/conntrack/interrupt pressure on the target service has actually dropped
Make the alarm storm in monitoring “expected” (label, do not silence)

FlowSpec: surgical filtering, surgical risk

FlowSpec is very powerful because you can write filters by fields like “port/proto/flags.” But the risk is this: a wrong rule cuts production traffic too.

Two safe usage patterns I rely on in the field:

Rate-limit (slow down instead of drop)
Only a narrow match (single target IP + single port + short duration)

Verification steps

After applying FlowSpec, watch the following two metrics together:

Service metrics: error rate, latency, saturation
Network metrics: PPS/BPS drop, drop counters, policer counters

If only the network metric drops while the service metric does not recover, you are intervening at the wrong place (e.g. an L7 attack, application layer).

Operational runbook: step by step

During the event, “who does what” must be short and clear:

Triage (5 min): attack vector, target(s), impact (SLO), decision (RTBH/FlowSpec/other)
Change record (2 min): who, when, which rule/prefix, target duration
Apply (1–3 min): push the rule/prefix
Verify (5 min): service + network metrics
Rollback (planned): remove when the duration is up; collect evidence for the postmortem

Postmortem: a real improvement list after a DDoS

Even when RTBH/FlowSpec succeed, the to-do list after the event is very clear:

Edge capacity: PPS/BPS, conntrack, interrupt tuning
Application resilience: caching, queue, circuit breaker
Observability: netflow/sflow, WAF logs, upstream telemetry
Process: rule templates, on-call authority matrix, drills

Conclusion

Designed correctly, RTBH and FlowSpec save time during DDoS events; designed incorrectly, they hurt production traffic. That is why the decision tree and the rollback standard must be part of the runbook, just as much as the technical commands. On the operational leadership side, the biggest win is making “under which condition do we activate which tool?” a decision made in advance, not a question asked in panic.

DDoS Response Runbook with BGP RTBH and FlowSpec

Prerequisite: topology and ownership boundaries must be clear

Decision tree: RTBH or FlowSpec?

RTBH: minimum safe usage template

Verification steps

FlowSpec: surgical filtering, surgical risk

Verification steps

Operational runbook: step by step

Postmortem: a real improvement list after a DDoS

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

Preventing Edge Outages with BGP Max-Prefix Limits

BGP Traffic Engineering Runbook for the Enterprise Edge

Prerequisite: topology and ownership boundaries must be clear

Decision tree: RTBH or FlowSpec?

RTBH: minimum safe usage template

Verification steps

FlowSpec: surgical filtering, surgical risk

Verification steps

Operational runbook: step by step

Postmortem: a real improvement list after a DDoS

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

Preventing Edge Outages with BGP Max-Prefix Limits

BGP Traffic Engineering Runbook for the Enterprise Edge

Klavye Kısayolları