[AusNOG] census issues tonight

Glenn Hocking glenn.hocking at woosaw.com
Wed Aug 10 14:53:11 EST 2016


I think that is exactly what has happened and expert in only one area was in charge at the expense 
of any real insight in to the big picture.

I hate the term but a 'full stack developer/administrator' was probably needed. Someone with 
understanding of network/bgp/peering/dns/OS/applications/databases/security/people' etc.

They do exist, as do experts in one field that 'know the answer' (or not).

It does look like the failure was many things, which really comes down to the overall design.

As many of us here have alluded to, many things at different points failed, including honesty. Just 
throwing money at one or two areas does not a good robust online census make....

Glenn Hocking | Managing Director
Woosaw Pty Ltd | www.woosaw.com.au
Sydney Office: +612 8090 3441 | Mobile: 0420 942 641
PO Box 391 │ Pyrmont NSW 2009  | Australia

On 10/08/2016 2:28 PM, Mark Delany wrote:
> On 10Aug16, James Braunegg allegedly wrote:
>> No need for Geo Blocking.. that???s hard work
>>
>> Just only advertise the route locally within Australia i.e... to Optus, Telstra and on peering exchanges... Job done..
>
> Nope. Job not done. This sort of single-bullet approach is probably
> why they failed.
>
> If you want scale and resiliency there are many many things you do to
> ensure success. For example how would an AU-only route announcement
> protect against a DDOS initiated here? Australians love their ancient
> Windows boxen so there are plenty of locally available bots for rent.
>
> It's hard to know where to even begin with the census site as they got
> it wrong in so many ways. It's obvious they never even did a mental
> walk thru of what-ifs.
>
> Based on HTTP responses with failure text, we can guess that that they
> had a coupled system when a de-coupled one would have been more
> resilient. They relied on physical scaling which is obviously
> impossible to augment in any reasonable time frame. They did not do a
> trial run of anything to try and get a sense of the traffic profile so
> they were completely guessing. Why not get everyone to register a week
> beforehand to get a feel for the traffic and load? Their servers were
> centralized, which is an obvous no-no. Even their DNS setup was such
> that they couldn't swing traffic quickly if they had to.
>
> Their efforts at switching routing during the evening suggests that
> they though it was some sort of traffic based DOS, but as other
> observed, there is not a lot of evidence that that was actually the
> case. It looks like all they knew was that their service was failing
> and they were scrambling to deal with it. Did they do a practise run
> with an actually DDOS? Their 6h DNS TTL suggests not as that's one of
> the first things you want to be able to change rapidly.
>
> I also saw no evidence of their ability to gracefully degrade. Either
> they were up or they were down. No ability to redistribute the
> traffic, nor to have the browser-based JS reach for an alternative
> site or for the site to do less work when it got too busy, such as
> dump and defer validation.
>
> Their one bullet seems to be to have provisioned twice as much
> front-end server capacity as they thought they'd need. A mere 2x
> margin for a completely new, unknown traffic profile system? That's
> pretty scandalous for such a high-profile site right there.
>
>
> Mark.
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
> !DSPAM:1,57aaad77212331173813997!
>


More information about the AusNOG mailing list