MTU and PMTUD Blackhole: An Incident Runbook

One of the hardest classes of incident to diagnose in production is this: the system looks generally up, yet some users work while others don’t. In particular:

TLS handshakes hang for some clients,
API calls succeed with a small payload but time out with a large one,
the same endpoint behaves well from some networks and poorly from others.

This picture is often read as “an application bug.” Yet there’s a frequent root cause: MTU/PMTUD blackhole.

Concept: what does PMTUD solve?

Path MTU Discovery (PMTUD) is used to find the “maximum transmittable packet size” (MTU) along the path between two endpoints. It typically breaks for this reason:

The endpoint sends a large packet with the DF (Don’t Fragment) bit set
A device along the path can’t forward the packet (a smaller MTU is required)
The device should respond with an ICMP message similar to “fragmentation needed”
If that ICMP is blocked, the endpoint can’t lower the MTU → packets are silently dropped → blackhole

The most common triggers

The transitions where this most often shows up in the field:

IPsec/GRE tunnels (overlay header + crypto overhead)
SD-WAN/MPLS edge transitions
Cloud interconnect / transit gateway / firewall zone transitions
Mixing jumbo-frame segments with 1500 MTU segments
“MSS clamp” applied but not covering all endpoints, leaving the fix half-done

Incident triage: a 15-minute fast diagnosis

Goal: quickly answer “is this an application issue, or path MTU/PMTUD?“

1) Tie the symptom to packet size

If small payload works and large payload breaks, MTU likelihood rises.
Protocols like HTTP/2 or gRPC can show a similar pattern; again, look for the packet-size relationship.

2) Ping with DF (mind the Linux/macOS/Windows differences)

Run a DF test toward the remote endpoint (or a test endpoint near the suspect hop). Example (Linux):

ping -M do -s 1472 <hedef-ip> -c 3
ping -M do -s 1400 <hedef-ip> -c 3

1472 payload + 28 byte IP/ICMP header ≈ 1500 MTU
If the large packet fails and the small packet works, you have a “smaller MTU on the path” signal.

3) tracepath / traceroute for “pmtu” hints

tracepath <hedef-ip> | head -n 20

If you see a pmtu line or messages like “too big,” the case is even stronger.

4) On the application side, TCP signal: retrans + stalls

If you have access, take a short capture on the affected host:

sudo tcpdump -nn -i any 'host <hedef-ip> and tcp' -c 200

Look for:

The same segment being sent over and over (retransmission)
SYN/SYN-ACK present, but the handshake doesn’t progress afterward (could be MSS/MTU breakage)

Mitigation: fast, safe first moves

During an incident, the “least-risky” intervention is usually one of these (depending on the environment):

1) TCP MSS clamping (temporary relief)

Applying MSS clamping at the tunnel or edge firewall can recover services while PMTUD is broken. But this isn’t a “set and forget” solution; it can mask the root cause.

2) Set tunnel/overlay MTU correctly

On encapsulating layers like IPsec/GRE, effective MTU drops. Consider the tunnel interface MTU and endpoint PMTUD behavior together.

3) Allowlist the ICMP that PMTUD needs

The goal isn’t to fully open ICMP; it’s to let the types/codes required for PMTUD pass under control and log them.

Root-cause closure: lasting countermeasures

1) Add an MTU test to the change checklist

Especially for these changes:

new tunnel/overlay
new firewall transition
moving to a different provider/POP
jumbo-frame rollout

Standardize the “do large packets pass through?” test.

2) Observability: catch MTU-driven incidents with metrics

Good signals:

A rise in TCP retransmission rate
SYN completes but the application-layer handshake (TLS) doesn’t
A clear latency + timeout uptick on specific paths/segments

3) Documentation: make “MTU facts” visible

In production, MTU is often an “assumption” no one owns. Write down the effective MTU per segment (especially after tunnel/encryption).

Conclusion

An MTU/PMTUD blackhole prolongs incidents by masquerading as an application bug. For a correct diagnosis, tie the symptom to packet size; narrow the probability with quick DF tests; tie the temporary mitigation (MSS clamp) to a permanent solution (correct MTU + correct ICMP allowlist + change tests). In operational reality, success isn’t a “very technical narrative” — it’s a repeatable runbook and closure discipline.

MTU and PMTUD Blackhole: An Incident Runbook

Concept: what does PMTUD solve?

The most common triggers

Incident triage: a 15-minute fast diagnosis

1) Tie the symptom to packet size

2) Ping with DF (mind the Linux/macOS/Windows differences)

3) tracepath / traceroute for “pmtu” hints

4) On the application side, TCP signal: retrans + stalls

Mitigation: fast, safe first moves

1) TCP MSS clamping (temporary relief)

2) Set tunnel/overlay MTU correctly

3) Allowlist the ICMP that PMTUD needs

Root-cause closure: lasting countermeasures

1) Add an MTU test to the change checklist

2) Observability: catch MTU-driven incidents with metrics

3) Documentation: make “MTU facts” visible

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Syslog on Network Devices: TLS, Buffering, and Log Storm

Protecting Router & Switch Control Plane with CoPP/CPP…

Preventing Edge Outages with BGP Max-Prefix Limits

Concept: what does PMTUD solve?

The most common triggers

Incident triage: a 15-minute fast diagnosis

1) Tie the symptom to packet size

2) Ping with DF (mind the Linux/macOS/Windows differences)

3) tracepath / traceroute for “pmtu” hints

4) On the application side, TCP signal: retrans + stalls

Mitigation: fast, safe first moves

1) TCP MSS clamping (temporary relief)

2) Set tunnel/overlay MTU correctly

3) Allowlist the ICMP that PMTUD needs

Root-cause closure: lasting countermeasures

1) Add an MTU test to the change checklist

2) Observability: catch MTU-driven incidents with metrics

3) Documentation: make “MTU facts” visible

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Syslog on Network Devices: TLS, Buffering, and Log Storm

Protecting Router & Switch Control Plane with CoPP/CPP…

Preventing Edge Outages with BGP Max-Prefix Limits

Klavye Kısayolları