[AusNOG] Outage that costs Millions
Narelle
narellec at gmail.com
Thu Jul 1 12:22:33 EST 2010
On Thu, Jul 1, 2010 at 10:20 AM, Matt Carter <matt at iseek.com.au> wrote:
>
> As others have pointed out a "spanning tree issue" doesn't tear down your
> network for 90 minutes, it *prevents* it from being torn down for 90 minutes,
> it could be thought of as a "last resort safety" so to assert a spanning tree
> issue caused this problem, in my mind, is to assert a lack of spanning tree,
> meaning the required last resort safety mechanisms were either not in place,
> or not configured properly. (if they were, how could it be a spanning tree issue??)
I have seen failures of this duration in large spanning tree networks
before. The reason for the lengthy time to restore is a) it can be
really tricky to find the 'root' of the problem, b) people had
forgotten life way back when before routers were everywhere (and over
the last few years have been relearning all this, and c) in large
carriers people start to get scared when restoring traffic as the
rerouting of traffic gets complex / ports need to be identified,
records updated etc etc.
It just ain't as simple as it looks...
<disclaimer>
and this in no way is meant to explain or justify any of Telstra's
actions, indeed I know virtually nothing about the specific
configuration of that network...
</disclaimer>
--
Narelle
narellec at gmail.com
More information about the AusNOG
mailing list