İçeriğe Atla
Mustafa Erbay
Tutorials · 12 min read · görüntülenme Türkçe oku
100%

A Pre-Validation Pipeline for Network Changes with Batfish

A practical Batfish flow that validates routing/ACL changes before they reach production via 'snapshot + question set,' catching human error early.

A Pre-Validation Pipeline for Network Changes with Batfish — cover image

The most expensive class of network mistake is this: the change looks correct, but somewhere it creates a reachability/ACL side effect. By the time you catch it in production the answer is “roll back” — yet in some environments rolling back is also risky (multiple dependencies, simultaneous changes, state).

In this post I describe a flow that has saved me repeatedly in the field:

  1. take a config snapshot
  2. run a question set with Batfish
  3. if the result is not “as expected,” block the merge

Goal: the same six questions for every PR

My minimum set:

  • With this change, which prefixes became unreachable?
  • Did a new ACL/route-policy cut off any traffic?
  • Has default route / next-hop behavior shifted?
  • Are BGP/OSPF adjacencies behaving as expected?
  • Has any leak between VRFs appeared?
  • Did the permission you opened “only here” expand somewhere else?

Setup: bring up Batfish in a container

The most practical approach is Docker.

docker run --rm -d --name batfish -p 9997:9997 -p 9996:9996 batfish/allinone

Batfish ships with two interfaces (which can vary by version):

  • service (analysis engine)
  • client (you talk to it through pybatfish)

Snapshot structure: config + environment

A Batfish snapshot fundamentally contains:

  • Device configurations (in the vendor format)
  • (Optional) interface/status/env data

A suggested repo layout:

network/
  snapshots/
    prod/
      configs/
        r1.cfg
        r2.cfg
      hosts/
        host1.json

This way “prod snapshot” stays fixed; the PR change produces a new snapshot.

Question set: 3 critical tests that block the PR

1) Reachability: does the expected flow exist?

Sample question: “Does TCP/5432 work from the app VLAN to the DB VLAN?”

  • source: app subnet
  • destination: db subnet
  • protocol/port: tcp/5432

2) ACL: did an unexpected deny appear?

Especially in policies that “deny by default,” the wrong order can produce a major incident.

3) Routing: did next-hop change?

When BGP local-pref, route-map, or IGP metric is touched, an unexpected hairpin can show up.

CI/CD: a PR gate via GitHub Actions

The flow:

  1. On the PR branch, generate a snapshot (config render / export)
  2. Start the Batfish container
  3. Run the question set
  4. If it fails, the workflow fails → no merge

A simple workflow skeleton:

name: network-precheck
on:
  pull_request:
jobs:
  batfish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Start Batfish
        run: docker run --rm -d --name batfish -p 9997:9997 -p 9996:9996 batfish/allinone
      - name: Run questions
        run: |
          python -m pip install --quiet pybatfish
          python scripts/batfish_questions.py --snapshot network/snapshots/pr

I deliberately don’t include the script example in this repo, because every organization’s question set is different. But the template is the same: snapshot + questions + gate.

Operational tips (the things that make a difference in the field)

  • Keep the snapshot “close to reality”: use rendered configs (if templating is in play)
  • Keep the question set small but pick the critical ones (6–10 questions)
  • Format the failure output to drive action: which prefix, which ACL, which device
  • Attach the Batfish report to the change ticket (audit trail and trust)

Conclusion

What you’re really doing with Batfish is this: treating the network as “code” and making the change testable. Especially in large networks, this approach drops change risk dramatically and reduces the number of times you say “we noticed it in prod.”

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts