Network Telemetry with IPFIX/NetFlow: A Pipeline for DDoS and Capacity

In network operations there are two extremes: either you stare at a single “interface utilization” graph, or you try to capture every packet and drown in the data. The third path, the one that’s actually sustainable in production, is flow telemetry: with IPFIX/NetFlow/sFlow you can answer “who talked to whom, how much, and when” with enough fidelity to operate.

In this post, I walk through a realistic flow pipeline design that supports DDoS triage and capacity/peering decisions.

What does flow telemetry buy you?

Flow becomes a “game changer” especially in these scenarios:

DDoS: attack vector (protocol/port), top talkers, target prefix/service
Capacity: which applications fill the link, what hours they spike
Anomaly: a new destination country/ASN, an unexpected port, “high fan-out” behavior
Incident: fast evidence for the question “which segment was talking?”

Pipeline components (minimal but sufficient)

The minimum architecture that has worked for me in practice:

Exporter: IPFIX/NetFlow on the router/switch/firewall
Collector: receives UDP, normalizes (HA where possible)
Enrichment: ASN/GeoIP, prefix, application labels
Storage: fast querying (usually a columnar DB)
Dashboard/Alert: ready-made panels for DDoS triage and capacity

On the exporter side: right place, right rate

Where you export flow from is critical:

Edge uplink: DDoS and transit/peering visibility
DC core: east-west density, critical segments
Firewall: correlation with policy/zone context (vendor dependent)

Be deliberate about sampling:

For DDoS and volumetric visibility, sampling (e.g. 1/1000) is usually sufficient.
For low-volume but critical flows (auth/management), aggressive sampling can cause you to miss the signal.

On the collector side: UDP reality and resilience

Production realities of a flow collector:

UDP packet loss happens; treat it as a “design assumption.”
If collector capacity fills up, data loss is silent.
For that reason, instrument the collector itself: ingest_qps, dropped_packets, queue_depth, cpu, disk.

Two practical approaches for HA:

If your exporters support two collector targets (active/active), use it.
If not: anycast VIP + stateless collector (though loss/dedup discussions still apply).

Enrichment: raw flow alone is not enough

Enrichments that increase operational value:

ASN/GeoIP: a change in source/destination ASN produces an anomaly signal
Prefix map: speeds up the “which service/prefix is the target” question
Port map: 443 isn’t always “HTTPS,” but it’s a good baseline
Device/zone tag: which edge/DC/segment

Query model: design around triage questions

The questions I most often ask during DDoS triage:

What’s the top dst_ip/dst_prefix at the target?
What does the top protocol/port distribution look like?
What are the top src_asn / src_country?
Compared to “normal baseline,” where did the increase begin?

For fast answers, presets like “last 15 min, 1 hour, 24 hours” and pre-built queries are essential.

Alert logic: “fast signal, low noise”

Simple but useful alert examples:

Threshold breach on bps or pps for a specific prefix/service (against baseline)
A newly-seen dst_port (suddenly rising when never present in prod)
Excessive surge from a single src_asn

Runbook: produce a DDoS picture in 5 minutes with flow

My practical “first 5 minutes” sequence:

Identify the target prefix/service (LB VIP, anycast prefix, app subnet)
Pull top dst_port/protocol for the last 5–10 min
Pull top src_asn and top src_country
If you see known vectors like udp/53, udp/123, udp/1900, speak the same language to the upstream
Make the mitigation decision: RTBH/FlowSpec/scrubbing/WAF (depending on service type)

With this discipline, flow produces “evidence” instead of “I had a feeling there was an attack.”

Conclusion

A telemetry pipeline based on IPFIX/NetFlow lets you make faster and more accurate decisions during DDoS, and strengthens capacity and anomaly visibility in normal times. It isn’t as heavy as packet capture, and it isn’t as blind as an SNMP graph. With the right sampling, good enrichment, and clear triage questions, flow telemetry becomes one of the most efficient signal sources in network operations.

Network Telemetry with IPFIX/NetFlow: A Pipeline for DDoS and Capacity

What does flow telemetry buy you?

Pipeline components (minimal but sufficient)

On the exporter side: right place, right rate

On the collector side: UDP reality and resilience

Enrichment: raw flow alone is not enough

Query model: design around triage questions

Alert logic: “fast signal, low noise”

Runbook: produce a DDoS picture in 5 minutes with flow

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Syslog on Network Devices: TLS, Buffering, and Log Storm

Protecting Router & Switch Control Plane with CoPP/CPP…

DoH/DoT/DoQ in Enterprise Networks: Policy and Visibility

What does flow telemetry buy you?

Pipeline components (minimal but sufficient)

On the exporter side: right place, right rate

On the collector side: UDP reality and resilience

Enrichment: raw flow alone is not enough

Query model: design around triage questions

Alert logic: “fast signal, low noise”

Runbook: produce a DDoS picture in 5 minutes with flow

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Syslog on Network Devices: TLS, Buffering, and Log Storm

Protecting Router & Switch Control Plane with CoPP/CPP…

DoH/DoT/DoQ in Enterprise Networks: Policy and Visibility

Klavye Kısayolları