Every organization that speaks BGP at the edge carries a quiet risk: a wave of bad prefixes. Sometimes that wave is a route leak coming from upstream, sometimes it’s a configuration mistake. The outcome is usually the same:
- RIB/FIB inflate, CPU climbs
- The control plane lags
- BGP sessions flap
- The “real problem” turns into an “internet is down” incident in a heartbeat
In this article I’m walking through one of the highest-leverage guardrails for shrinking that class of incidents: the max-prefix limit.
Why max-prefix is an “operational” control
Max-prefix limit looks like just a BGP knob, but it actually answers this question:
“How many prefixes do I expect from this neighbor (peer/upstream), and what will I do if that number deviates?”
So max-prefix gives you:
- Error prevention (an automatic brake during a leak wave)
- Alarm generation (warning thresholds)
- A runbook trigger (who calls whom)
1) First step: Establish a normal prefix baseline
Don’t turn max-prefix on with an arbitrary number. Start with the baseline:
- Measure prefix counts for 7–14 days
- Note weekly variance (trend) and anomalous days like “patch day”
Two numbers matter:
- Normal (median)
- Peak (95th/99th percentile)
The limit needs to accommodate both, but be positioned so it still catches a “leak.”
2) Design: A 3-layer guardrail
The most stable model in the field:
- Warning threshold: Alarm at 80–90%
- Hard limit: A specific “trip” count
- Trip behavior: Should the session drop, should the routes go away, or just log?
Trip behavior depends on the organization’s risk appetite:
- On some edges, “tear down the session” is better (don’t accept broken information)
- On others, “keep the session up but stop accepting new prefixes” (if the vendor supports it)
What matters: this behavior should not be a surprise during an incident.
3) Practical configuration examples (vendor-agnostic)
The examples below illustrate the concept; adapt them to your own device.
Junos (example)
set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit maximum 140000
set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit teardown 5
set protocols bgp group TRANSIT neighbor <peer> family inet unicast prefix-limit idle-timeout 60
Cisco IOS-XR (example)
neighbor <peer>
address-family ipv4 unicast
maximum-prefix 140000 90 restart 1
The key nuance here: 90 typically acts as the warning threshold; it’s used for the “approaching the limit” alarm.
4) Monitoring: Which alarm actually helps?
Set three alarms cleanly:
- Prefix count threshold exceeded (warning)
- Session flap (stability)
- Control-plane CPU / route processing duration (capacity)
When these alarms come together, “is this a leak or normal growth?” gets faster to answer.
5) Incident runbook: When max-prefix trips
- Quick verification: Is it really max-prefix?
- A “prefix limit exceeded”-style entry in the logs
- Prefix count spike in NMS/telemetry
- Impact: Which services were affected? (internet egress, partner)
- Source: Which peer? (transit/IX/partner)
- Decision:
- If the leak is upstream: escalation to the upstream + temporary filter
- If it’s our side: last config diff, last maintenance, change id
- Temporary mitigation (with explicit risk acceptance):
- Raise the limit briefly (only if there’s evidence)
- Or shut the session in a controlled way (so the broken route doesn’t enter)
- Permanent action:
- Tighten the prefix-filter/policy
- Update the max-prefix value against the new baseline
- Postmortem: “why didn’t the alarm fire earlier?”
Conclusion
Max-prefix limit is one of the lowest-cost, highest-impact guardrails at the edge. Its value shows up not when a route leak happens, but when one doesn’t: it protects the control plane, shrinks the incident blast radius, and reduces uncertainty during the decision moment. What makes a difference in the field isn’t writing the command; it’s establishing the baseline, choosing the alarm thresholds correctly, and actually running the runbook.