[AusNOG] Outage that costs Millions

Lincoln Dale ltd at cisco.com
Thu Jul 1 12:13:26 EST 2010


On 01/07/2010, at 10:56 AM, John Edwards wrote:
> Similar meltdowns have happened in the past due to mac address table limitations. Once upon a time VTP misadventures would have also been a safe bet.

certainly VTP has had its share of issues, but not really because of VTP itself but rather operational practice.
the older version(s) of VTP did suffer from the issue of "inserting a new switch with a higher configuration revision number".  <http://www.cisco.com/warp/public/473/vtp_flash/>.
morale of the story is: don't use VTP[v1,v2] in that manner. :)

VTPv3 addresses that particular issue.


> I have heard that the network in question once had issues with an ISP running DSL/PPPoE backhaul across it, and as a result enforced a limit of mac addresses per customer. Fast forward to 2010, and there are probably more than enough devices on this network to get to the limits of most switching hardware in this country.

if you exceed the MAC table size, worst case is you get is partial flooding going on.

MAC table exhaustion has no bearing on STP and its ability to make switch ports operate in forwarding/blocking/learning/listening state.  BPDUs still "work" in the presence of a full mac table too, since they use the IEEE slow-sender special multicast address.

certainly its not desirable behaviour for some traffic to be flooding.  but its not the end of the world either as the devices with the smaller MAC tables are typically at the 'edge'.

having said that, its certainly an area where TRILL also addresses too:
 - mac-address is hierarchical (inner/outer addresses).
	- L2 tables no longer need to be synchronised across the entire L2 domain
	- 'edge' switches only need to know their 'locally attached devices' + where conversations are going on
	- 'core' switches never need to know about edge devices at all
  ... so there is much less pressure on things like MAC table,
 - its possible to do further optimisations on L2 learning to further reduce pressure on MAC table
 - if there was a 'transient loop, a TTL on L2 frames takes care of it
 - if there was flooding going on somewhere, it would be constrained to one of N topologies in the L2 cloud

all in <http://www.ausnog.net/files/ausnog-03/presentations/ausnog03-dale-ethernet_evolution.pdf> :)


On 01/07/2010, at 11:36 AM, Andrew Fort wrote:
>>> Only thing not to like is the lack of implementations :-).
>> 
>> .. and then there was one.   http://www.google.com/search?q=fabricpath  :)
> 
> Lincoln; I'm assuming you're involved -

indeed, i am involved and its taken a long time to get to this point. :)


cheers,

lincoln.


More information about the AusNOG mailing list