[AusNOG] Outage that costs Millions

Lincoln Dale ltd at cisco.com
Wed Jun 30 12:16:39 EST 2010


On 30/06/2010, at 11:57 AM, Andrew Fort wrote:

> On Wed, Jun 30, 2010 at 11:46 AM, Lincoln Dale <ltd at cisco.com> wrote:
>> On 30/06/2010, at 10:43 AM, Daniel Hood wrote:
>>> The issue the outage was facing was a Spanning Tree Loop that knocked over all of the
>> 
>> it is the _absence_ of Spanning Tree that means that a network cannot _recover_ from someone causing a loop.
>> common misconception is that Spanning Tree causes loops.  that is incorrect at best.
> 
> Sure.  If it were due to a customer or operator created loop, the
> question for me becomes: was l2 traffic suppression configured, and
> did it work?

certainly it is best practice to make use of the features that are available (e.g. storm control) that help mitigate "bad things" that can happen at L2 (e.g. host going mad generating broadcast frames).

but if there is a loop at L2:
 (a) STP's role is to build a topology in a loop free manner.  it does that well enough but perhaps not in an optimal manner.
 (b) 'BPDU Guard' operates on edge ports sending out periodic BPDUs in the expectation that they never come back - and if they do - the edge port that receives that BPDU is errdisabled.

best practice is that (b) is most certainly enabled too.

no idea what happened in this scenario, but my experience is that L2 loops attributed to "STP" are rarely due to STP bugs or issues but rather operational issues or misconfiguration.

certainly there are aspects of STP that could be 'better'.  incidentally i talked about those at AusNOG last year. :)


cheers,

lincoln.


More information about the AusNOG mailing list