Preventing Edge Outages with BGP Max-Prefix Limits

Every organization that speaks BGP at the edge carries a quiet risk: a wave of bad prefixes. Sometimes that wave is a route leak coming from upstream, sometimes it’s a configuration mistake. The outcome is usually the same:

RIB/FIB inflate, CPU climbs
The control plane lags
BGP sessions flap
The “real problem” turns into an “internet is down” incident in a heartbeat

In this article I’m walking through one of the highest-leverage guardrails for shrinking that class of incidents: the max-prefix limit.

Why max-prefix is an “operational” control

Max-prefix limit looks like just a BGP knob, but it actually answers this question:

“How many prefixes do I expect from this neighbor (peer/upstream), and what will I do if that number deviates?”

So max-prefix gives you:

Error prevention (an automatic brake during a leak wave)
Alarm generation (warning thresholds)
A runbook trigger (who calls whom)

1) First step: Establish a normal prefix baseline

Don’t turn max-prefix on with an arbitrary number. Start with the baseline:

Measure prefix counts for 7–14 days
Note weekly variance (trend) and anomalous days like “patch day”

Two numbers matter:

Normal (median)
Peak (95th/99th percentile)

The limit needs to accommodate both, but be positioned so it still catches a “leak.”

2) Design: A 3-layer guardrail

The most stable model in the field:

Warning threshold: Alarm at 80–90%
Hard limit: A specific “trip” count
Trip behavior: Should the session drop, should the routes go away, or just log?

Trip behavior depends on the organization’s risk appetite:

On some edges, “tear down the session” is better (don’t accept broken information)
On others, “keep the session up but stop accepting new prefixes” (if the vendor supports it)

What matters: this behavior should not be a surprise during an incident.

3) Practical configuration examples (vendor-agnostic)

The examples below illustrate the concept; adapt them to your own device.

Junos (example)

set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit maximum 140000
set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit teardown 5
set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit idle-timeout 60

Cisco IOS-XR (example)

neighbor <peer>
 address-family ipv4 unicast
  maximum-prefix 140000 90 restart 1

The key nuance here: 90 typically acts as the warning threshold; it’s used for the “approaching the limit” alarm.

4) Monitoring: Which alarm actually helps?

Set three alarms cleanly:

Prefix count threshold exceeded (warning)
Session flap (stability)
Control-plane CPU / route processing duration (capacity)

When these alarms come together, “is this a leak or normal growth?” gets faster to answer.

5) Incident runbook: When max-prefix trips

Quick verification: Is it really max-prefix?
- A “prefix limit exceeded”-style entry in the logs
- Prefix count spike in NMS/telemetry
Impact: Which services were affected? (internet egress, partner)
Source: Which peer? (transit/IX/partner)
Decision:
- If the leak is upstream: escalation to the upstream + temporary filter
- If it’s our side: last config diff, last maintenance, change id
Temporary mitigation (with explicit risk acceptance):
- Raise the limit briefly (only if there’s evidence)
- Or shut the session in a controlled way (so the broken route doesn’t enter)
Permanent action:
- Tighten the prefix-filter/policy
- Update the max-prefix value against the new baseline
- Postmortem: “why didn’t the alarm fire earlier?”

Conclusion

Max-prefix limit is one of the lowest-cost, highest-impact guardrails at the edge. Its value shows up not when a route leak happens, but when one doesn’t: it protects the control plane, shrinks the incident blast radius, and reduces uncertainty during the decision moment. What makes a difference in the field isn’t writing the command; it’s establishing the baseline, choosing the alarm thresholds correctly, and actually running the runbook.

Preventing Edge Outages with BGP Max-Prefix Limits

Why max-prefix is an “operational” control

1) First step: Establish a normal prefix baseline

2) Design: A 3-layer guardrail

3) Practical configuration examples (vendor-agnostic)

Junos (example)

Cisco IOS-XR (example)

4) Monitoring: Which alarm actually helps?

5) Incident runbook: When max-prefix trips

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

BGP Traffic Engineering Runbook for the Enterprise Edge

DDoS Response Runbook with BGP RTBH and FlowSpec

Why max-prefix is an “operational” control

1) First step: Establish a normal prefix baseline

2) Design: A 3-layer guardrail

3) Practical configuration examples (vendor-agnostic)

Junos (example)

Cisco IOS-XR (example)

4) Monitoring: Which alarm actually helps?

5) Incident runbook: When max-prefix trips

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

BGP Traffic Engineering Runbook for the Enterprise Edge

DDoS Response Runbook with BGP RTBH and FlowSpec

Klavye Kısayolları