BGP Traffic Engineering Runbook for the Enterprise Edge

The day you start speaking BGP at the enterprise edge, you don’t just open up “internet egress”; you also stand up a control plane where you manage the flow of traffic. The problem is this: for most teams, BGP Traffic Engineering (TE) ends up being “turning a few knobs.” The outcome is predictable:

Unintended inbound/outbound shifts
POP/ISP imbalance, capacity surprises
Panic-driven changes during incidents
“Permanent temporary fixes” with no rollback plan

This article presents the most useful TE tools at the enterprise edge (localpref, community, prepend, MED) in a runbook format, with the logic of “which one, and when?“

1) The first distinction: outbound or inbound?

To run TE properly, separate two questions clearly:

Outbound (egress): Which ISP/POP should outbound traffic from your organization use?
Inbound (ingress): Which ISP/POP should incoming traffic from the internet arrive on?

This distinction is critical because most knobs only affect one direction:

LocalPref: typically outbound selection (internal policy)
AS-path prepend: typically inbound effect (the path others see)
MED: can affect inbound but only under specific conditions (same upstream/AS)
Community: a “do this for me” signal to the upstream (can be inbound/outbound)

2) Minimum observation set (before making a change)

The most expensive mistake in TE: you made the change but didn’t measure what you changed.

Minimum signals before the change:

BGP session health: session up/down, flap count
Prefix/route counts: accepted/announced prefixes, “expected vs observed”
Traffic distribution: bps/pps/flow per ISP/POP
Service impact: latency/timeout for critical services (internet egress affects them)
DNS/Anycast (if applicable): query/rcode distribution per POP

Operational practice: define a 15–30 minute observation window for the “TE change” and read the same window again after the change.

3) Tool selection: which knob, when?

3.1 For outbound: LocalPref (the primary tool)

LocalPref is the most deterministic method for outbound selection (internal).

When to use it:

“Egress through ISP-A, ISP-B as backup”
“POP-1 egress is saturated; bring POP-2 online”
“Different egress for specific destination ASes”

Runbook step:

Write the goal: a clear measure like “Outbound 70% ISP-A, 30% ISP-B”
Apply only to a single class of routes: e.g. “default route” or “transit learned”
Start at one POP (ring rollout)
Rollback: keep the previous localpref value at the ready

3.2 For inbound: Community (the cleanest tool, when available)

Upstreams typically provide controls of these kinds via communities:

Prepend to a specific POP/region
Blackhole / RTBH
Localpref manipulation (within the upstream)
Propagation limits like “no-export”

When to use it:

“I want control inside the ISP’s network”
“Temporarily de-prefer a POP on inbound”

Risk: if the community contract isn’t documented or different teams interpret it differently, “one line of config” creates large effects.

3.3 For inbound: AS-path prepend (most common but most uncertain)

Prepend influences remote selection indirectly. So treat it less as a “fine adjustment” and more as a coarse steering tool.

When to use it:

When upstream community options are limited
When you want to create “preference” between two ISPs

Operational rules:

Don’t go aggressive in one shot; step it up gradually (e.g. +1, then +2)
Balance capacity with different prepend per POP/ISP
Don’t make it “permanent” without measuring its effect; revisit within 24 hours

3.4 MED: only meaningful in the right context

MED is generally meaningful for selecting between different entry points within the same upstream AS. In multi-ISP scenarios, it usually doesn’t deliver the effect you expect.

Rule: don’t use MED like a “lifesaver knob”; use it as a bounded signal.

4) Change flow (operational model)

The flow that has worked best for me in the field:

Goal sentence: “20% of inbound traffic will shift to POP-2”
Scope: which prefixes? entire internet, or a specific service?
Observation: which dashboards/metrics are the decision criteria?
Rollback: is one-command rollback possible?
Time box: 30 min observation, then decision
Record: change log (what, why, what happened)

5) Incident triage: when traffic goes the “wrong” way

5.1 Initial checklist (5 minutes)

Are BGP sessions stable? Any flaps?
Has the announced prefix set changed? (missing/extra announcements)
Did route-map/policy ordering change?
Did intra-POP routing on the anycast/ECMP/IGP side break?
Is there maintenance or a policy change on the upstream side?

5.2 Quick action (least risky)

The least-risky rollback is usually this:

Outbound problem: revert localpref to the previous value
Inbound problem: undo the community/prepend you added

The reflex of “fix by adding a new setting” extends the crisis. First return the system to its previous stable state, then root cause.

6) Minimum viable TE checklist

Inbound/outbound goals written separately
TE change tested at one POP (ring rollout)
30 min before/after observation window recorded
Rollback command at the ready
Upstream community contract documented
A “TE triage” runbook for incident time exists

At the enterprise edge, BGP TE isn’t about “making the network prettier”; it’s a job done to manage operational risk. Even when the knobs stay the same, the outcome changes when the approach changes: with a goal sentence, measurement, and rollback, TE stops being a source of surprises.

BGP Traffic Engineering Runbook for the Enterprise Edge

1) The first distinction: outbound or inbound?

2) Minimum observation set (before making a change)

3) Tool selection: which knob, when?

3.1 For outbound: LocalPref (the primary tool)

3.2 For inbound: Community (the cleanest tool, when available)

3.3 For inbound: AS-path prepend (most common but most uncertain)

3.4 MED: only meaningful in the right context

4) Change flow (operational model)

5) Incident triage: when traffic goes the “wrong” way

5.1 Initial checklist (5 minutes)

5.2 Quick action (least risky)

6) Minimum viable TE checklist

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

Preventing Edge Outages with BGP Max-Prefix Limits

DDoS Response Runbook with BGP RTBH and FlowSpec

1) The first distinction: outbound or inbound?

2) Minimum observation set (before making a change)

3) Tool selection: which knob, when?

3.1 For outbound: LocalPref (the primary tool)

3.2 For inbound: Community (the cleanest tool, when available)

3.3 For inbound: AS-path prepend (most common but most uncertain)

3.4 MED: only meaningful in the right context

4) Change flow (operational model)

5) Incident triage: when traffic goes the “wrong” way

5.1 Initial checklist (5 minutes)

5.2 Quick action (least risky)

6) Minimum viable TE checklist

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

DDoS Scrubbing Center Design: GRE, BGP, and Failover

Preventing Edge Outages with BGP Max-Prefix Limits

DDoS Response Runbook with BGP RTBH and FlowSpec

Klavye Kısayolları