İçeriğe Atla
Mustafa Erbay
Technology · 12 min read · görüntülenme Türkçe oku
100%

Network Telemetry with IPFIX/NetFlow: A Pipeline for DDoS and Capacity

Build an operational telemetry pipeline by collecting and enriching IPFIX/NetFlow streams for DDoS triage, capacity planning, and anomaly detection.

Network Telemetry with IPFIX/NetFlow: A Pipeline for DDoS and Capacity — cover image

In network operations there are two extremes: either you stare at a single “interface utilization” graph, or you try to capture every packet and drown in the data. The third path, the one that’s actually sustainable in production, is flow telemetry: with IPFIX/NetFlow/sFlow you can answer “who talked to whom, how much, and when” with enough fidelity to operate.

In this post, I walk through a realistic flow pipeline design that supports DDoS triage and capacity/peering decisions.

What does flow telemetry buy you?

Flow becomes a “game changer” especially in these scenarios:

  • DDoS: attack vector (protocol/port), top talkers, target prefix/service
  • Capacity: which applications fill the link, what hours they spike
  • Anomaly: a new destination country/ASN, an unexpected port, “high fan-out” behavior
  • Incident: fast evidence for the question “which segment was talking?”

Pipeline components (minimal but sufficient)

The minimum architecture that has worked for me in practice:

  1. Exporter: IPFIX/NetFlow on the router/switch/firewall
  2. Collector: receives UDP, normalizes (HA where possible)
  3. Enrichment: ASN/GeoIP, prefix, application labels
  4. Storage: fast querying (usually a columnar DB)
  5. Dashboard/Alert: ready-made panels for DDoS triage and capacity

On the exporter side: right place, right rate

Where you export flow from is critical:

  • Edge uplink: DDoS and transit/peering visibility
  • DC core: east-west density, critical segments
  • Firewall: correlation with policy/zone context (vendor dependent)

Be deliberate about sampling:

  • For DDoS and volumetric visibility, sampling (e.g. 1/1000) is usually sufficient.
  • For low-volume but critical flows (auth/management), aggressive sampling can cause you to miss the signal.

On the collector side: UDP reality and resilience

Production realities of a flow collector:

  • UDP packet loss happens; treat it as a “design assumption.”
  • If collector capacity fills up, data loss is silent.
  • For that reason, instrument the collector itself: ingest_qps, dropped_packets, queue_depth, cpu, disk.

Two practical approaches for HA:

  • If your exporters support two collector targets (active/active), use it.
  • If not: anycast VIP + stateless collector (though loss/dedup discussions still apply).

Enrichment: raw flow alone is not enough

Enrichments that increase operational value:

  • ASN/GeoIP: a change in source/destination ASN produces an anomaly signal
  • Prefix map: speeds up the “which service/prefix is the target” question
  • Port map: 443 isn’t always “HTTPS,” but it’s a good baseline
  • Device/zone tag: which edge/DC/segment

Query model: design around triage questions

The questions I most often ask during DDoS triage:

  1. What’s the top dst_ip/dst_prefix at the target?
  2. What does the top protocol/port distribution look like?
  3. What are the top src_asn / src_country?
  4. Compared to “normal baseline,” where did the increase begin?

For fast answers, presets like “last 15 min, 1 hour, 24 hours” and pre-built queries are essential.

Alert logic: “fast signal, low noise”

Simple but useful alert examples:

  • Threshold breach on bps or pps for a specific prefix/service (against baseline)
  • A newly-seen dst_port (suddenly rising when never present in prod)
  • Excessive surge from a single src_asn

Runbook: produce a DDoS picture in 5 minutes with flow

My practical “first 5 minutes” sequence:

  1. Identify the target prefix/service (LB VIP, anycast prefix, app subnet)
  2. Pull top dst_port/protocol for the last 5–10 min
  3. Pull top src_asn and top src_country
  4. If you see known vectors like udp/53, udp/123, udp/1900, speak the same language to the upstream
  5. Make the mitigation decision: RTBH/FlowSpec/scrubbing/WAF (depending on service type)

With this discipline, flow produces “evidence” instead of “I had a feeling there was an attack.”

Conclusion

A telemetry pipeline based on IPFIX/NetFlow lets you make faster and more accurate decisions during DDoS, and strengthens capacity and anomaly visibility in normal times. It isn’t as heavy as packet capture, and it isn’t as blind as an SNMP graph. With the right sampling, good enrichment, and clear triage questions, flow telemetry becomes one of the most efficient signal sources in network operations.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts