Troubleshooting path MTU problems

Not long ago we started having very unusual issues our email servers. Mail would be inexplicably held for delivery, bounce back, or fail to send for hours and then send without issue later. Some users couldn’t fetch email by POP until they restarted their mail client. We investigated the mail software, but weeks of investigation turned up nothing.

Around the same time, we also experienced intermittent problems logging in to MSN Messenger, and some users complained of issues accessing certain web pages, including a lot of HTTPS links. I began to suspect these were related.

When the going gets tough, the tough sniff packets. We can sniff at any point in our core routing network, which is by far the most effective way to find a networking problem. And so I found this:

root@router1:~# tshark
  1   0.000000 10.0.2.1 -> 192.168.3.155 ICMP Destination unreachable (Fragmentation needed)

10.0.2.1 is the upstream router from one of our network providers. 192.168.3.155 is one of our servers. So something is definitely wrong upstream of us.

It’s possible to do most troubleshooting with tshark (wireshark’s command-line form). But I prefer to use SCP to copy the capture file back to my PC and investigate it in the GUI. I captured a few more packets and saved them:

tshark -i eth0 -f 'host 10.0.2.1' -w /tmp/upstream.cap -S

This sniffs on eth0 for packets to or from 10.0.2.1 and saves them to /tmp/upstream.cap while showing me a running summary. I let this run until a few more ICMP errors are captured, then hit Ctrl-C to cancel. I can then either copy the file back to my PC and open it with Wireshark, or directly open it using KDE’s “fish://” KIO handler. Try opening a URL like this in Dolphin:

fish://root@192.168.0.1/tmp/

Just right-click and open the file with Wireshark, and KDE will transparently download the file and start Wireshark for you.

I selected one of the offending packets and saw a packet from 10.0.2.1 with this:

This shows that the MTU of the link between 10.0.2.1 and the next router is 1496 bytes. To explain further, we need to understand how IP and Ethernet MTU interact.

The MTU, or maximum transmission unit, for a normal Ethernet network is typically 1500 bytes. This means that an Ethernet packet may contain up to 1500 bytes of data exclusive of the Ethernet headers. As data moves from one Ethernet to the next, as in the case of all traffic routed over the Internet, it must not exceed that Ethernet’s MTU. If it does, the packet is fragmented, split into smaller packets and transmitted as two or more smaller packets.

IP normally allows for fragmentation. This is an inefficient but effective solution, given that a host on the Internet doesn’t usually know the lowest MTU of the entire path between it and the target it would like to reach. However, IP packets contain a flag in the header, “Do not fragment”. If this bit is set, and a packet needs to be fragmented due to a smaller MTU, it cannot be transmitted onward. The router where this happens will drop the packet and return an ICMP type 3, code 4 message, “Destination unreachable, fragmentation needed but ‘do not fragment’ is set.”

Now that I know what to look for, I can find all kinds of MTU issues.

tshark -i eth0 -f 'icmp[icmptype] == 3 and icmp[icmpcode] == 4' -w /tmp/frag.cap -S

Watching this for a while, I found a lot of remote networks with lower than normal MTUs. ICMP error messages contain copies of the IP and TCP/UDP headers of the offending packet, which makes it easy to find what triggered it. I found several examples of SMTP traffic with “do not fragment” set, trying to enter networks with MTUs as low as 300 bytes. But these are remote networks, and are someone else’s problem to fix. What I shouldn’t have is a low MTU immediately upstream, affecting all my traffic.

Now I know I have an MTU problem. But is this the cause of all my problems? I sniffed again, this time trying to login to MSN. The router at 10.0.2.1 again returned an error:

This one told me the cause of the MSN problem. As part of its authentication, MSN Messenger logs in (to 65.54.165.177) via HTTPS (TCP port 443), with a 1500-byte packet which has the “do not fragment” bit set.

Some solutions to the problem of MTU path discovery exist. Sniffing identified the problem to my satisfaction. But you can test more directly by using a traceroute tool that supports MTU discovery. The process is simple: perform traceroute using IP packets of size 1500 bytes with “do not fragment” set. If they are rejected, back off to the “MTU of next hop” value and repeat until you reach your target. On Linux, use tracepath or traceroute --mtu.

root@router1:~# tracepath -n 4.2.2.2
 1:  192.168.101.3    0.236ms pmtu 1500
 1:  192.168.101.1    14.080ms
 1:  192.168.101.1    1.985ms
 2:  192.168.1.254    70.445ms
 3:  10.86.24.1       32.215ms
 4:  no reply
 5:  no reply
 6:  212.113.15.19    37.143ms asymm  7
 7:  4.69.139.97      34.641ms
 8:  4.69.132.134     44.419ms
 9:  4.69.141.169     38.904ms
10:  4.69.133.90      41.321ms
11:  4.69.141.149     41.615ms
12:  4.69.132.138     45.783ms
13:  4.69.140.22      46.130ms
14:  4.68.23.195      46.663ms
15:  no reply
16:  4.2.2.2          45.328ms reached
     Resume: pmtu 1500 hops 16 back 241

A path MTU problem will appear as a drop in MTU:

 1:  192.168.101.3    0.281ms pmtu 1500
 1:  192.168.101.1    12.194ms
 2:  192.168.1.254    69.332ms pmtu 1496
 3:  10.86.24.1       35.182ms

On Windows, try mturoute. On Mac OS X, try traceroute -F google.com 1500.

I reported the problem to our upstream provider and included the tracepath output and a few packets captured during testing. They scheduled a maintenance window for the following afternoon and corrected the problem with two minutes of down time. We haven’t had an issue with email, MSN, or web pages since.

Tags: ,

Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.