[AusNOG] Cisco GRE Tunnel weirdness

Tony td_miles at yahoo.com
Fri Jan 3 18:19:54 EST 2014

Seemingly unrelated, but today two separate customer issues that have come to me:

1. A 2M p2p Telstra service where usable MTU across the link has dropped 
overnight to approx 200 bytes in size (from 1500). This caused OSPF to drop on the link and forced traffic onto a 3G backup service (fault logged with carrier)

2. Another customer reported that GRE has stopped working to their firewall that we provide IP connectivitiy to. Nothing changed by us, nothing changed by them, they are scheduling a reboot of the firewall.

To top it off I've been having wierd issues on my link at home for the last couple of days (resold Telstra DSL port) and have just checked and for some reason the MTU on the virtual-access interface on our LNS for my service is not 1460 instead of the 1500 that it was previously. Setting "adjust-mss" on the LAN of my router has resolved my inability to access web stuff that was timing out, but leaves me no closer to knowing WHY the MTU has suddenly changed.

Coincidence or conspiracy, who knows....

As someone else said, try monitoring CPU & interface utilisation (graph via SNMP) to see whether that is taking a hit during your times of slowness. Is this a GRE tunnel over Internet or something else ? What speed ? 3945's are grunty enough to handle a fair amount of GRE traffic, but not if they are sustaining/filtering a DDOS attack at the same time.


 From: "joe at apcs.com.au" <joe at apcs.com.au>
To: ausnog at lists.ausnog.net 
Sent: Friday, 3 January 2014 4:49 PM
Subject: [AusNOG] Cisco GRE Tunnel weirdness

Hi List,

  I have a GRE tunnel between 2 sites over a link limited to 1500 MTU.

  As such we have mtu set to 1440 and mss-adjust to 1400 on both ends. 
This is overly cautious probably but it was working.

  Anyway - it had been working quite fine for some time, but randomly we 
started seeing massive performance issues. Bandwidth throughput halved 
and ping times sky rocketed (~50ms to ~1000ms). We tried bringing down 
the tunnel and back up, no luck, and even power cycled each end (Cisco 
3945's), no luck.

  We have confirmed that the config's had not been changed for weeks. 
Neither end had crashed and rebooted. The tunnel itself did not go down 
between 'working' and 'not working'. Performance and ping times via the 
tunnel endpoint address' is fine, proving (to me) that the networks 
between the 2 sites are not the issue, but the tunnel itself. No links 
are saturated, and CPU performance is quite tame (both before and during 
the issue)

  For now we have gone back to backup path but I haven't been able to 
find similar problems online, and my own Cisco tunnel experience leaves 
me empty so far.

  Has anyone experienced a similar issue? A working tunnel suddenly 
having major performance issues?

