İçeriğe Atla
Mustafa Erbay
Technology · 10 min read · görüntülenme Türkçe oku
100%

MTU and PMTUD Blackhole: An Incident Runbook

When some users work and others don't, a frequent cause is broken PMTUD and an MTU blackhole. Diagnosis steps and a permanent fix.

MTU and PMTUD Blackhole: An Incident Runbook — cover image

One of the hardest classes of incident to diagnose in production is this: the system looks generally up, yet some users work while others don’t. In particular:

  • TLS handshakes hang for some clients,
  • API calls succeed with a small payload but time out with a large one,
  • the same endpoint behaves well from some networks and poorly from others.

This picture is often read as “an application bug.” Yet there’s a frequent root cause: MTU/PMTUD blackhole.

Concept: what does PMTUD solve?

Path MTU Discovery (PMTUD) is used to find the “maximum transmittable packet size” (MTU) along the path between two endpoints. It typically breaks for this reason:

  1. The endpoint sends a large packet with the DF (Don’t Fragment) bit set
  2. A device along the path can’t forward the packet (a smaller MTU is required)
  3. The device should respond with an ICMP message similar to “fragmentation needed”
  4. If that ICMP is blocked, the endpoint can’t lower the MTU → packets are silently dropped → blackhole

The most common triggers

The transitions where this most often shows up in the field:

  • IPsec/GRE tunnels (overlay header + crypto overhead)
  • SD-WAN/MPLS edge transitions
  • Cloud interconnect / transit gateway / firewall zone transitions
  • Mixing jumbo-frame segments with 1500 MTU segments
  • “MSS clamp” applied but not covering all endpoints, leaving the fix half-done

Incident triage: a 15-minute fast diagnosis

Goal: quickly answer “is this an application issue, or path MTU/PMTUD?“

1) Tie the symptom to packet size

  • If small payload works and large payload breaks, MTU likelihood rises.
  • Protocols like HTTP/2 or gRPC can show a similar pattern; again, look for the packet-size relationship.

2) Ping with DF (mind the Linux/macOS/Windows differences)

Run a DF test toward the remote endpoint (or a test endpoint near the suspect hop). Example (Linux):

ping -M do -s 1472 <hedef-ip> -c 3
ping -M do -s 1400 <hedef-ip> -c 3
  • 1472 payload + 28 byte IP/ICMP header ≈ 1500 MTU
  • If the large packet fails and the small packet works, you have a “smaller MTU on the path” signal.

3) tracepath / traceroute for “pmtu” hints

tracepath <hedef-ip> | head -n 20

If you see a pmtu line or messages like “too big,” the case is even stronger.

4) On the application side, TCP signal: retrans + stalls

If you have access, take a short capture on the affected host:

sudo tcpdump -nn -i any 'host <hedef-ip> and tcp' -c 200

Look for:

  • The same segment being sent over and over (retransmission)
  • SYN/SYN-ACK present, but the handshake doesn’t progress afterward (could be MSS/MTU breakage)

Mitigation: fast, safe first moves

During an incident, the “least-risky” intervention is usually one of these (depending on the environment):

1) TCP MSS clamping (temporary relief)

Applying MSS clamping at the tunnel or edge firewall can recover services while PMTUD is broken. But this isn’t a “set and forget” solution; it can mask the root cause.

2) Set tunnel/overlay MTU correctly

On encapsulating layers like IPsec/GRE, effective MTU drops. Consider the tunnel interface MTU and endpoint PMTUD behavior together.

3) Allowlist the ICMP that PMTUD needs

The goal isn’t to fully open ICMP; it’s to let the types/codes required for PMTUD pass under control and log them.

Root-cause closure: lasting countermeasures

1) Add an MTU test to the change checklist

Especially for these changes:

  • new tunnel/overlay
  • new firewall transition
  • moving to a different provider/POP
  • jumbo-frame rollout

Standardize the “do large packets pass through?” test.

2) Observability: catch MTU-driven incidents with metrics

Good signals:

  • A rise in TCP retransmission rate
  • SYN completes but the application-layer handshake (TLS) doesn’t
  • A clear latency + timeout uptick on specific paths/segments

3) Documentation: make “MTU facts” visible

In production, MTU is often an “assumption” no one owns. Write down the effective MTU per segment (especially after tunnel/encryption).

Conclusion

An MTU/PMTUD blackhole prolongs incidents by masquerading as an application bug. For a correct diagnosis, tie the symptom to packet size; narrow the probability with quick DF tests; tie the temporary mitigation (MSS clamp) to a permanent solution (correct MTU + correct ICMP allowlist + change tests). In operational reality, success isn’t a “very technical narrative” — it’s a repeatable runbook and closure discipline.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts