[AusNOG] Telstra Network Down

Mark Delany g2x at juliet.emu.st
Thu Feb 2 15:35:23 EST 2017


> Of course when people say we have 2 core data centers, this should imply no
> data center is allowed to run over 50% capacity. It's odd/strange that 3
> active core data centers should sound so unorthodox, yet this is the only
> way to assure you can run your DCs at 65% and handle a DC going black. Begs
> the question why 4 active core DCs isn't standard architecture for core
> national infrastructure (which would assure high availability under 75%
> load), and 2x efficient in idle infrastructure.

If you have the data-centres then more is better from an idle capacity
perspective.

E.g. if you have 10 DCs and you lose one, you only need to absorb an
increase of ~11% in each of the remaining DCs to take on the lost 10%
load.

In aggregate that means that your infrastructure only needs to be
over-provisioned by 11% to cope with a SPOF. That's vastly cheaper
than an over-provisioning of 100% if you have just two DCs or 50%
over-provisioning if you have three DCs.

All good in theory, but you bump up against the messy matter of data
consistency. For loosy-goosy systems like facebook or twitter,
eventual consistency-ish data stores work great. If 1 in a million
updates gets lost or out of order, who cares?

But for intolerant systems like banking, co-ordinating consistent
updates across a federation of systems is a tough problem. This is why
banks will pay for a 100% infrastructure redundancy to avoid
complexity and facebook will pay for 11% redundancy to avoid cost.


tl;dr idle infrastructure may not be your biggest cost.


Mark.


More information about the AusNOG mailing list