[AusNOG] Solar flare

Smith, Mark mark.smith at nn.com.au
Thu Mar 8 17:10:06 EST 2012


High end-switches and routers avoid using ECC memory because it is too slow. But that's ok, if you care about reliable delivery, you'll push your data around in TCP.

________________________________
From: ausnog-bounces at lists.ausnog.net [mailto:ausnog-bounces at lists.ausnog.net] On Behalf Of Scott Howard
Sent: Thursday, 8 March 2012 5:03 PM
To: Peter Childs
Cc: <ausnog at ausnog.net>
Subject: Re: [AusNOG] Solar flare

On Wed, Mar 7, 2012 at 9:47 PM, Peter Childs <PChilds at internode.com.au<mailto:PChilds at internode.com.au>> wrote:
<quote>
> As with all computer and networking devices, the AS5400 is susceptible to
> the rare occurrence of parity errors in processor memory.  Parity errors may

These excuses may seem far-fetched, and I used to get a lot of unbelieving looks when I gave this excuse when I was an engineer at Sun - but the simple fact is that it IS true.

And by true, what I mean is that cosmic rays can and do cause bits to flip, especially in the type of memory used in CPU caches.  What may or may not be true is if any specific occurrence was due to cosmic rays - because obviously there's no way to prove that one way or another!

Sun learnt this the hard way when they released their "UltraSPARC" processors many years ago.  During the design phase they chose to take the cheaper/faster path of using Parity for the on-chip cache, which means that errors could be detected, but not corrected. This approach had worked perfectly in previous models, but in the UltraSPARCs the faster, larger and higher density cache became very susceptible to bit flips as a result of a number factors - with cosmic rays being a suspected cause of many such errors.

At the end of the day, the fault in these cases is in the vendors choice of using parity memory rather than ECC memory. The "cosmic rays" defense is really just them admitting that their hardware can't handle normal environmental circumstances.  (Sun moved to mirrored caches and/or ECC to avoid such issues!)

Even PC manufacturers learnt the error in their ways with using parity memory many, many years ago. Of course they took the opposite approach and just removed the parity.  You can't get parity errors if you don't have parity - and you can always just blame the resulting crash on Microsoft!!

So if your computer crashes in the next day or two, blame the manufacturer, not Microsoft...  (If you're using a Mac.. well..)

  Scott

________________________________

This email is intended for the named recipient only. The information contained in this message may be confidential, or commercially sensitive. If you are not the intended recipient you must not reproduce or distribute any part of the email, disclose its contents to any other party, or take any action in reliance on it. If you have received this email in error, please contact the sender immediately and please delete this message completely from any systems. Confidentiality and legal privilege are not waived or lost by reason of mistaken delivery to you.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20120308/61da5661/attachment.html>


More information about the AusNOG mailing list