As a firewall rule set grows, security does not increase; usually the opposite happens: visibility shrinks, false positives rise, change risk inflates. The most dangerous moment is when the rulebase becomes sacred and “no one touches it”. From that moment on, the firewall stops being a control plane and turns into operational debt.
Why is rulebase cleanup as much an “ops job” as a “security job”?
Because deleting a rule carries two risks:
- Production outage (legitimate traffic breaks)
- Security gap (bad traffic gets through)
So the right approach is not “delete the rule” but “prove what the rule actually does”.
1) Inventory: every rule must have an owner and an expiration date
Minimum fields:
- Owner team/service
- Business justification (1 sentence)
- Related ticket/change record
- Duration (e.g. 90 days) and a renewal rule
2) Evidence: hitcount alone is not enough, but it’s a good starting point
Three sources of evidence that accelerate rule cleanup in the field:
- Hitcount / rule usage: a rule that didn’t fire at all in a given window
- Log evidence: which actual flows the rule was serving
- Dependency analysis: ports, IPs, and change history of the related services
A critical nuance about hitcount:
- “0 hits” doesn’t always mean “useless” (rare but critical flows exist)
- The measurement window matters (end-of-month batches, campaign periods, etc.)
So my practical approach for a 0-hit rule is:
- 30 days of hitcount + log
- Owner approval
- Disable in waves → observe → delete
3) Shadow rules: you think a rule works, but it’s actually shadowed
A shadow rule is this:
- A broader rule above already permits/denies the traffic
- The rule below “looks like it exists” but is not effective
This class of rules bloats the rulebase and misleads audits. In a cleanup, the fastest wins often come from here.
4) Safe deletion waves: disable first, delete later
Don’t run rulebase cleanup as “one big change”; run it in waves:
- Wave 0 (staging/lab): export, analysis, report
- Wave 1 (low risk): shadow + 0 hit + owner approved
- Wave 2 (medium risk): low hit but business criticality unclear
- Wave 3 (high risk): broad permit rules, legacy services
Standard flow within each wave:
- Disable the rule (keep logging on)
- Observe for 7–14 days (depending on context)
- If no incident, delete
- Document the outcome (is rule debt going down?)
5) Automation: set up export/diff and change discipline
The point of automation in rulebase cleanup is not “let AI write rules”:
- Rule set export (daily/weekly)
- Diff report (who changed what)
- Standard tagging (owner, ticket)
- “Expired” alerts (expired rules)
Once this discipline is in place, even if the rulebase grows again, it grows under control.
Conclusion
Firewall rulebase cleanup is not “brave deletion” without hitcount, log evidence, and ownership. The model that produces the most stable results for me in the field is: build the inventory and evidence first, then start with shadow rules and simplify in waves. With this, the firewall stops being an “untouchable black box” again and becomes a manageable control plane.