The day you start speaking BGP at the enterprise edge, you don’t just open up “internet egress”; you also stand up a control plane where you manage the flow of traffic. The problem is this: for most teams, BGP Traffic Engineering (TE) ends up being “turning a few knobs.” The outcome is predictable:
- Unintended inbound/outbound shifts
- POP/ISP imbalance, capacity surprises
- Panic-driven changes during incidents
- “Permanent temporary fixes” with no rollback plan
This article presents the most useful TE tools at the enterprise edge (localpref, community, prepend, MED) in a runbook format, with the logic of “which one, and when?“
1) The first distinction: outbound or inbound?
To run TE properly, separate two questions clearly:
- Outbound (egress): Which ISP/POP should outbound traffic from your organization use?
- Inbound (ingress): Which ISP/POP should incoming traffic from the internet arrive on?
This distinction is critical because most knobs only affect one direction:
- LocalPref: typically outbound selection (internal policy)
- AS-path prepend: typically inbound effect (the path others see)
- MED: can affect inbound but only under specific conditions (same upstream/AS)
- Community: a “do this for me” signal to the upstream (can be inbound/outbound)
2) Minimum observation set (before making a change)
The most expensive mistake in TE: you made the change but didn’t measure what you changed.
Minimum signals before the change:
- BGP session health: session up/down, flap count
- Prefix/route counts: accepted/announced prefixes, “expected vs observed”
- Traffic distribution: bps/pps/flow per ISP/POP
- Service impact: latency/timeout for critical services (internet egress affects them)
- DNS/Anycast (if applicable): query/rcode distribution per POP
Operational practice: define a 15–30 minute observation window for the “TE change” and read the same window again after the change.
3) Tool selection: which knob, when?
3.1 For outbound: LocalPref (the primary tool)
LocalPref is the most deterministic method for outbound selection (internal).
When to use it:
- “Egress through ISP-A, ISP-B as backup”
- “POP-1 egress is saturated; bring POP-2 online”
- “Different egress for specific destination ASes”
Runbook step:
- Write the goal: a clear measure like “Outbound 70% ISP-A, 30% ISP-B”
- Apply only to a single class of routes: e.g. “default route” or “transit learned”
- Start at one POP (ring rollout)
- Rollback: keep the previous localpref value at the ready
3.2 For inbound: Community (the cleanest tool, when available)
Upstreams typically provide controls of these kinds via communities:
- Prepend to a specific POP/region
- Blackhole / RTBH
- Localpref manipulation (within the upstream)
- Propagation limits like “no-export”
When to use it:
- “I want control inside the ISP’s network”
- “Temporarily de-prefer a POP on inbound”
Risk: if the community contract isn’t documented or different teams interpret it differently, “one line of config” creates large effects.
3.3 For inbound: AS-path prepend (most common but most uncertain)
Prepend influences remote selection indirectly. So treat it less as a “fine adjustment” and more as a coarse steering tool.
When to use it:
- When upstream community options are limited
- When you want to create “preference” between two ISPs
Operational rules:
- Don’t go aggressive in one shot; step it up gradually (e.g. +1, then +2)
- Balance capacity with different prepend per POP/ISP
- Don’t make it “permanent” without measuring its effect; revisit within 24 hours
3.4 MED: only meaningful in the right context
MED is generally meaningful for selecting between different entry points within the same upstream AS. In multi-ISP scenarios, it usually doesn’t deliver the effect you expect.
Rule: don’t use MED like a “lifesaver knob”; use it as a bounded signal.
4) Change flow (operational model)
The flow that has worked best for me in the field:
- Goal sentence: “20% of inbound traffic will shift to POP-2”
- Scope: which prefixes? entire internet, or a specific service?
- Observation: which dashboards/metrics are the decision criteria?
- Rollback: is one-command rollback possible?
- Time box: 30 min observation, then decision
- Record: change log (what, why, what happened)
5) Incident triage: when traffic goes the “wrong” way
5.1 Initial checklist (5 minutes)
- Are BGP sessions stable? Any flaps?
- Has the announced prefix set changed? (missing/extra announcements)
- Did route-map/policy ordering change?
- Did intra-POP routing on the anycast/ECMP/IGP side break?
- Is there maintenance or a policy change on the upstream side?
5.2 Quick action (least risky)
The least-risky rollback is usually this:
- Outbound problem: revert localpref to the previous value
- Inbound problem: undo the community/prepend you added
The reflex of “fix by adding a new setting” extends the crisis. First return the system to its previous stable state, then root cause.
6) Minimum viable TE checklist
- Inbound/outbound goals written separately
- TE change tested at one POP (ring rollout)
- 30 min before/after observation window recorded
- Rollback command at the ready
- Upstream community contract documented
- A “TE triage” runbook for incident time exists
At the enterprise edge, BGP TE isn’t about “making the network prettier”; it’s a job done to manage operational risk. Even when the knobs stay the same, the outcome changes when the approach changes: with a goal sentence, measurement, and rollback, TE stops being a source of surprises.