Fragmentation issue when DF=1, Route-map to the rescue

Probably a bad title but it is the closest I can come up.

I have a weird issue at work last week where about all 100 workstations from a remote office can not go to one specific website and yet there is no issue hitting the same website at the main headquarter site. The symptom: A user would type the url and it would just sit at the browser and time-out after a minute or two.

Remote site and HQ is separated via IPSEC VPN device (KG175D), but the method works with any VPN device. A quick look at the wireshark sniffer installed at the remote site local machine reveals that DNS resolution is perfectly good and the initial TCP syn/ack is good. So the first 4 to 6 packets are good but when it tries to do “GET HTTP.ASP”, no reply packet from the destination. I then put another sniffer at the main HQ site to put another set of eyes on the packets. And I notice that the destination IS actually sending/resending the packets to the remote site but how come it is not getting through the VPN? Upon further investigation of the missing packets, here are the things wrong with it: MSS=1380; Bytes=1502; DF=1; Extra VPN device means extra headers to the packets. The big offender is the 1502 bytes. The quickest way I fix that, which serves me very well in the past is “IP TCP ADJUST-MSS 1300” on the remote sites gateway router(1300 being on the safe guestimate). This normally does the trick so layer 4 can negotiate amongst themselves the maximum segment size MSS so the application would send smaller packets rather than a huge 1502bytes. Unfortunately, despite my configuration and verifying on the sniffer that indeed the MSS I am using is 1300, for reasons unbeknownst to me the destination still insist on MSS1380 and still sends me 1502 bytes (follow up blog on this). After gathering all these data, I made a diplomatic phone call to the owner of the offending website to get the issue straightened out. I explained the situation and his website is not aggreing to my MSS1300 request and that 1502 bytes with DF=1 is too big for me. Cut the sob story short, the diplomatic approach fell flat and I am left with no option but one.

The Band-aid Fix:
Just to state the problem again and the thought process on the fix. Diplomatic approach and the MSS negotiation is dead end. Since I know that the issue is the 1502bytes being too big and that the Don’t Fragment bit DF=1 which means don’t allow fragmentation. My only option is to flip the Don’t Fragment DF bit to “0” (zero) to allow fragmentation. I could do this on the VPN device but I decided to do it at the headquarters last L3 device before the VPN device. It could be a router or a L3 switch and on my example it is a 6513 Cisco switch. Properly tune the access-list so that it will only touch the offending website and leave the other traffic untouched.

route-map clear-df permit 36
match ip address 136
set ip df 0

::access list 2.2.2.2 is the offending website and 3.3.3.0 is the remote site subnet. You can even put eq 80 if you want.
access-list 136 permit tcp host 2.2.2.2 3.3.3.0 0.0.0.255

::appply on interface:
ip policy route-map clear-df

After applying to the interface, I vefiried through both wireshark sniffer the the DF bit is now “0” zero, and the webpage now works. Happy ending.

Note: Just to be clear before I get ridiculed for allowing fragmentation. ¬†Fragmentation is really not my desired solution for this issue, as a matter of fact I that is why I call this one a band-aid rather than a fix. ¬†Modern well written application should not come to fragmentation hence most of them are df=1, but for this instance, I really don’t have a whole lot of option but to allow such fragmentation.