[AusNOG] Upstream PMTUD broken? Packets blackhole

s.s.o.n.i.k+a.u.s.n.o.g at gmail.com s.s.o.n.i.k+a.u.s.n.o.g at gmail.com
Wed Sep 14 00:34:46 EST 2016


Hi all,

 

Starting about four weeks ago, many of my customers began reporting weird
problems with their internet connections. Multiple ISPs, multiple upstream
providers, no config changes to routers/firewalls - yet "heavy" websites
stop loading, SSL errors show up continuously, downloads and uploads fail,
SMTP delays, email vanishing in transit, frequent timeouts, VPN failures,
YouTube and Netflix video quality degraded through the floor, and just
generally slow and dodgy feeling internet connections.

 

It feels like an MTU issue, or overzealous ICMP 3:4 filtering.

 

I've done as much testing and investigation as I can do, and the every ISPs
has pointed to the cause of the problem being with their Tier 1 / Tier 2
providers. Traceroute from affected customers pass through Vocus or Telstra,
and seem to all pass through IXs in NSW. If I'm lucky the traces complete,
but "request timed out" *** often shows up where it should not.

 

Results of MTU testing show gaps of varying sizes between the largest
successfully transmitted packet, and the size where ICMP 3:4 "frag needed"
response is returned.

 

Right now, I'm pinging an affected client's router from New York with three
different sized packets. All are a stream of "request timed out" errors. All
have the DF bit set, and use lengths one byte larger than the MTU each
device expected to accept.

 

Client firewall MTU set to 1484: Successful reply at length 1456.
Unsuccessful at 1457, however echo request is seen, and unreachable 3:4
message is returned by firewall, but never arrives at the source.

PPPoE Dialer MTU set to 1492: Echo requests are seen hitting the firewall at
lengths up to 1464 (expected), yet no 3:4 replies make it back to the
source. At 1465, no longer see packets (also expected) as PPPoE Dialer is
now sending unreachable 3:4 responses. These are also failing to make it
back to the source device.

Regular Ethernet MTU 1500: Completely unsuccessful right now. Mainly to test
upper bounds of this packet blackhole.

 

As for Telstra connections, they're also experiencing the blackhole issue,
with additional intermittent periods of heavy packet loss. 

 

Curiously, I'm seeing the odd echo response from Telstra make it all the way
back to the source server:

....

Request timed out.

Request timed out.

Reply from 123.123.123.123: bytes=1469 time=284ms TTL=51

Request timed out.

Request timed out.

....

 

(Note: 1468 byte packets reply successfully every single time from this IP).

 

 

Has anyone else been seeing similar issues recently?

Any help would be greatly appreciated.

 

-Dave

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20160914/a150118f/attachment.html>


More information about the AusNOG mailing list