[AusNOG] Telstra Network Down
Mark Delany
g2x at juliet.emu.st
Thu Feb 2 15:35:23 EST 2017
> Of course when people say we have 2 core data centers, this should imply no
> data center is allowed to run over 50% capacity. It's odd/strange that 3
> active core data centers should sound so unorthodox, yet this is the only
> way to assure you can run your DCs at 65% and handle a DC going black. Begs
> the question why 4 active core DCs isn't standard architecture for core
> national infrastructure (which would assure high availability under 75%
> load), and 2x efficient in idle infrastructure.
If you have the data-centres then more is better from an idle capacity
perspective.
E.g. if you have 10 DCs and you lose one, you only need to absorb an
increase of ~11% in each of the remaining DCs to take on the lost 10%
load.
In aggregate that means that your infrastructure only needs to be
over-provisioned by 11% to cope with a SPOF. That's vastly cheaper
than an over-provisioning of 100% if you have just two DCs or 50%
over-provisioning if you have three DCs.
All good in theory, but you bump up against the messy matter of data
consistency. For loosy-goosy systems like facebook or twitter,
eventual consistency-ish data stores work great. If 1 in a million
updates gets lost or out of order, who cares?
But for intolerant systems like banking, co-ordinating consistent
updates across a federation of systems is a tough problem. This is why
banks will pay for a 100% infrastructure redundancy to avoid
complexity and facebook will pay for 11% redundancy to avoid cost.
tl;dr idle infrastructure may not be your biggest cost.
Mark.
More information about the AusNOG
mailing list