İçeriğe Atla
Mustafa Erbay
Tutorials · 11 min read · görüntülenme Türkçe oku
100%

BGP Failover Lab Guide with FRRouting

Steps for validating BGP failover behavior in a lab for servers or edge environments using dual uplinks.

BGP Failover Lab Guide with FRRouting — cover image

Learning BGP failover scenarios in production is expensive. For edge servers with dual uplinks, transit routers, or service gateways, the safer path is to observe the behavior in a small lab first. FRRouting offers a lightweight and flexible enough tool for this job.

Technical diagram showing two routers, an FRRouting node, and BGP path changes
Even with a small lab topology, the goal is clear: see how neighbor loss and route change affect system behavior ahead of time.

Which scenario are we testing?

In this guide we assume a single FRRouting node establishes BGP neighborships with two separate upstream routers. The goal is to:

  • Prefer the primary path under normal conditions
  • Watch traffic shift to the secondary path when one upstream goes down
  • Measure behavior during failback

This setup is a good starting simulation for a data center edge layer or an enterprise service egress point.

Simple lab topology

A sample IP plan could look like this:

  • edge01: 10.10.10.10/24
  • rtr-a: 10.10.10.1/24
  • rtr-b: 10.10.10.2/24
  • Local ASN: 65010
  • Upstream ASNs: 65100 and 65200

You can build the lab with containers, virtual machines, or network namespaces. What matters is that the neighborships actually come up and that route preference can be observed.

FRRouting installation and basic configuration

On an Ubuntu-based node, the installation flow looks like this:

sudo apt-get update
sudo apt-get install -y frr frr-pythontools
sudo sed -i 's/^bgpd=no/bgpd=yes/' /etc/frr/daemons
sudo systemctl restart frr

Then a basic frr.conf skeleton:

frr version 10.0
frr defaults traditional
hostname edge01
service integrated-vtysh-config
!
router bgp 65010
 bgp router-id 10.10.10.10
 neighbor 10.10.10.1 remote-as 65100
 neighbor 10.10.10.2 remote-as 65200
 !
 address-family ipv4 unicast
  network 192.0.2.0/24
  neighbor 10.10.10.1 route-map PREFER_A in
  neighbor 10.10.10.2 route-map BACKUP_B in
 exit-address-family
!
route-map PREFER_A permit 10
 set local-preference 200
!
route-map BACKUP_B permit 10
 set local-preference 100

The logic here is simple: routes coming via rtr-a are chosen with higher local-preference.

Test steps

For initial verification, these commands are enough:

sudo vtysh -c "show bgp ipv4 unicast summary"
sudo vtysh -c "show bgp ipv4 unicast"
ip route get 203.0.113.10

Then take down the primary neighbor. Depending on your lab environment, you can use methods like shutting an interface, tearing down the BGP session, or blocking TCP/179 with iptables. The expected outcome:

  1. The rtr-a neighborship goes down.
  2. The best path is selected through rtr-b.
  3. The FIB updates.
  4. After a brief transition, application flow continues.

The most common issue during failback is connection flaps caused by the route returning too quickly. Rather than route dampening, the answer here is more controlled preference policies and upper-layer timing.

Metrics worth measuring

Saying “the route changed” isn’t enough. Measure these:

  • BGP neighbor down duration
  • Time to select the new best path
  • Failed request count at the application level
  • How long the reselection takes during failback

These measurements show how the network design actually translates into application behavior.

Which traps to check before going to production?

  • Are both upstreams advertising the same prefix with different communities?
  • Are you carrying a default route or specific prefixes?
  • Do you want ECMP, or do you need a strict primary/secondary flow?
  • Are application connections tolerant of brief interruptions?

The lab’s job isn’t to prove theory — it’s to break production assumptions early.

Conclusion

A small BGP lab built with FRRouting is more than strong enough to understand failover behavior. Especially for edge services, management networks, and enterprise egress points, seeing the practical impact of route selection ahead of time pays back significantly in operations. Network resilience is not just adding a backup link; it’s measuring exactly when and how that backup actually kicks in.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts