Network Drift with NetBox + Nornir: An Approval-Driven Remediation…

On the network side, “configuration drift” is unavoidable: emergency fixes, vendor differences, on-site pressure… Instead of trying to outright “ban” drift, the sustainable answer is to detect it and fix it under control.

In this article I walk through a practical flow:

NetBox (source of truth) → Nornir (execute) → Git PR (approval) → rollout (rings).

Target architecture (minimum viable)

The leanest version of this flow runs on these components:

NetBox: device/interface/IP/VLAN/tenant inventory
Git repo: the “desired state” (templates + variables)
Nornir job:
- pull the inventory from NetBox
- fetch running-config from each device
- render the “expected” output via templates
- produce a diff (the report)
- apply after approval (commit + tag)

Step 1 — Make the NetBox inventory “automation-friendly”

Practices that smooth out the drift flow on the NetBox side:

Assign roles to devices (core/edge/access)
Use site/region fields consistently
Align the VLAN/VRF model with what’s actually deployed
Add an “automation ring” custom field (canary/pilot/prod)

The goal: be able to slice the Nornir inventory by tags.

Step 2 — Nornir inventory: NetBox as the source

Two important details on the Nornir side:

Start with a read-only NetBox API token (for the report stage)
Use a separate token/identity for the “apply after approval” stage

This separation splits the risk between “produce a report” and “apply changes.”

Step 3 — Drift report: produce a per-device diff

The report stage aims to answer:

Which devices are drifting?
What class of drift is it? (ACL, routing, interface, NTP, SNMP, syslog…)
Is the drift “expected” (a planned change) or a surprise?

I prefer producing the report in two formats:

For humans: a Markdown summary plus the most critical diffs
For machines: JSON (CI gate / metrics)

Step 4 — PR workflow: “an approved drift remediation”

Standardize this information inside the PR:

The list of affected devices (by ring)
Type of change (routing/ACL/…)
Expected impact (risk)
Rollback command/plan
Change window (if any)

Step 5 — Rollout: ring by ring

For rollout discipline, I follow this sequence:

Canary: 1–3 devices
Pilot: a small site/tenant
Prod: the remainder

Measure at every stage:

Routing adjacency flap?
Packet loss / latency?
ACL hitcount anomaly?
CPU spike?

Step 6 — Rollback has to be real

Rollback can’t be “in theory it exists” — it has to be runnable in practice:

Treat the applied config change like a “transaction”
If the vendor supports it, lean on commit/confirm features
Keep the changes small and atomic

Closing: make drift visible first, then reduce it

The first win from this flow is not “less drift,” it is making drift visible. What is visible becomes manageable; what is manageable can be standardized.

If you’d like, the natural next step is layering “risk scoring” and “automatic maintenance window selection” on top of this flow — gates that adjust to the class of drift.

Network Drift with NetBox + Nornir: An Approval-Driven Remediation…

Target architecture (minimum viable)

Step 1 — Make the NetBox inventory “automation-friendly”

Step 2 — Nornir inventory: NetBox as the source

Step 3 — Drift report: produce a per-device diff

Step 4 — PR workflow: “an approved drift remediation”

Step 5 — Rollout: ring by ring

Step 6 — Rollback has to be real

Closing: make drift visible first, then reduce it

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

A Maintenance-Wave Runbook for Firmware Upgrades on Enterprise…

A Pre-Validation Pipeline for Network Changes with Batfish

IPAM and Inventory Automation with NetBox

Target architecture (minimum viable)

Step 1 — Make the NetBox inventory “automation-friendly”

Step 2 — Nornir inventory: NetBox as the source

Step 3 — Drift report: produce a per-device diff

Step 4 — PR workflow: “an approved drift remediation”

Step 5 — Rollout: ring by ring

Step 6 — Rollback has to be real

Closing: make drift visible first, then reduce it

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

A Maintenance-Wave Runbook for Firmware Upgrades on Enterprise…

A Pre-Validation Pipeline for Network Changes with Batfish

IPAM and Inventory Automation with NetBox

Klavye Kısayolları