[AusNOG] Lessons Learned From SA Blackout

Matt Baker matt.baker at colocity.com
Fri Sep 30 17:10:48 EST 2016


Hi John,

We operate two data centres in the Adelaide CBD. One is our original smaller site (DC1), the other medium sized (DC3). In the spirit of learning from a situation that we all plan for but hope doesn’t occur I’m happy to pass on my initial thoughts having now gone through the experience. Data centres plan for off-grid events, and extended outages, but we found it was the other incidental issues that were the areas that will need to be further looked at.

A more detailed review will be conducted next week and then I am happy to discuss further with you offline if you would like any more details. I am trying to be candid as possible on what we experienced so that everyone in the industry, both big and small, can maybe get something out of what we experienced.

Our two sites are located well away from flooding sources so at least we did not have to worry about this aspect.  I can only imagine how much more difficult it would be to add additional concerns around needing to cope with flooding on top of a power outage. While Adelaide does have some very high temperatures at certain times of the year (luckily dry heat) from my memory hasn’t had much in the way of extreme storms or natural disasters.

Overall we remained operational during the power crisis. Lessons were learnt and changes will be made to make things easier operationally for next time. 


> ·         What warnings do you get
> ·         What preparations did you make?

- The storm was well known to be coming through. It was well published that it was going to be a big event.
- Building gutters and roof areas were checked over to make sure they were clear and we were confident that we were prepared for the event. 
- We had all our generators tested prior to the storm coming through, and we had spare batteries ready to swap incase there was any issue.  (From what I heard some buildings around the CBD that had generators had issues starting due to dead batteries).
- As the storm came across the west coast we could see power disturbances in the mains on our monitoring systems. Harmonic distortions etc in the supply.  We were expecting a power outage.


> ·         What went well?

- Generators fired as expected at both sites and no power outage occurred. One chiller at DC1 complained and went into a fault situation, systems automatically failed across to the redundant chiller unit.  While our chillers are on generator power, they are obviously not on UPS. From initial look it appears a frequency issue just before the mains failed caused the chiller fault. A reset of this chiller fixed the issue.
- All monitoring worked very well. We knew instantly that everything was working as expected.
- Clients were fantastic.  We ended up with many more people onsite than I had expected, but everyone was so supportive and offered assistance where they could. This is actually a really great reflection of the IT industry.

> ·         What didn’t go well?

- Getting between sites. Even though our sites are located only 5 minutes apart by car (15 min fast walk across CBD) getting across the city getting between them was impossible given all traffic lights were out and the city was gridlocked.
- At one site I had concerns over the office area we use at the front of the building having street frontage. We left the lights off in this area to not attract too much attention to the building which is otherwise very nondescript.
- The amount of telco mobile coverage reduced significantly.  Optus appeared to be out in our part of the CBD, and Telstra appeared to have reduced coverage. No doubt cell phone towers were failing.

> ·         Lessons learned?

- Can never have enough diesel storage. Our DC3 site has diesel storage for 48 hours of operation. Our DC1 site has storage for a lot less capacity. Initially I miscalculated the consumption rates and how long diesel would last at the DC1 location. Having now seen the diesel consumption rates from the generators during the outage I feel we would have had more than enough storage to cover us throughout the night without refuelling.

- Fuel deliveries proved to be difficult. Word was that there was some deliveries occurring to hospitals through the night, but once what was in the trucks was exhausted then they were unable to refuel. I am not sure how accurate this is, and it is something I plan to look into further.

- Even though our DC1 site is located in the emergency services area (next to fire station, police HQ etc), it doesn’t help during a state wide power event. Normally due to this location we have very stable power that is not load shed.
 
- Our DC1 site is located a block away from a petrol station. Unfortunately petrol stations have no means of pumping fuel when there is a complete state wide power outage. It seems they do not have generator power.

- We refuelled manually overnight once power had returned to the state (lots of runs to petrol stations with jerry cans) as we prepared for a 2nd storm event to come through the next day.  I didn’t want to wait for a fuel delivery in the morning. Met many electricians and other people doing similar… City petrol stations ran out of diesel by the next morning.

- We are now planning to double up storage at both sites. I feel it is important now not to rely on external refuelling sources being available for any reasonable time during or after an extended outage.

- We are planning on having a better capability to transport diesel between our locations. Our newer site is a much bigger building and has space available to store more fuel that will then serve both locations independent of fuel suppliers providing we can transport diesel ourselves between sites.

- Consideration of our telco suppliers and their capabilities.  As we provide both co-location services and transit to many of our clients, we will be looking further into their capabilities and the buildings their core pops are located in. While none failed during this event and all transit remained up it is something we will be looking at should an extended outage occur.

- Consideration for no mobile coverage.  I was in London during the bombings many years ago when all mobile coverage was either turned off or stopped working for an extended period of time during that event. Communication with family and colleagues gets very difficult in these sorts of situations. It doesn’t always take a physical natural disaster to knock out mobile coverage as many cell towers are only on battery backups. We are all now very reliant on mobiles and they form an essential part of communication.



In the end it’s a shame that the state wide power outage has turned political.  I guess there is no surprise there, but hopefully behind the media coverage people are looking at this as a case study for the whole country. Hopefully some lessons have been learnt by all our power utilities.  Power across the country is interconnected and it appears very reliant on capacity in the overall system to cope with generation issues from individual states.

While Adelaide does not suffer from the normal tropical events that other parts of Australia do we are very dependant on the interstate connector to Victoria for our power capacity. Normally this isn’t too much of an issue. During a power issue a few months ago when the interstate connector went down they needed to load shed in order to reduce usage to match generation capacity.  At that lime the CBD was excluded from the load shedding.

I don’t think anyone could have predicted this occurring, and I don’t believe that anyone could have considered the number of high voltage lines that would come down in a single event. I am not surprised there was a frequency issue and it was decided (either automatically or manually) to separate South Australia from the rest of the country to protect against a nation wide outage. It does take these sorts of events to learn and prepared better for next time. This has shown that this sort of event is possible.

I personally had concerns for family and their safety and can only imagine the situation like the Sandy superstorm or some other major disaster where life is at major risk. In those sorts of situations things take on a much more different view. With this in mind there are many areas in the state where people are still concerned with flooding and also large parts of the northern and western areas of the state are still without power. I hope that people stay safe and all can get through this without further incident.



Regards,

Matt Baker
Colocity Pty Ltd
Adelaide Data Centre, Co-location & Networks
matt.baker at colocity.com
Mb: 0423 058526
Ph:  (08) 8232-3250      Fx:  (08) 8227-0315



>  
>  
> Hi,
>  
> I’m the MD for Uptime Institute in Australia.  After Superstorm Sandy in the US we published an article on the Lessons Learn by Operators of Critical Data Facilities.
>  
> https://journal.uptimeinstitute.com/disaster-recovery-lessons-learned-superstorm-sandy/
>  
> I’d like to do the same for the SA Blackout.  Its only when events like this happen that we really understand what’s going on. If anyone has something they would to share either in this forum or direct to jduffin at uptimeinstitute.com I’d be grateful.  In particular:
>  
> ·         What warnings do you get
> ·         What preparations did you make?
> ·         What went well?
> ·         What didn’t go well?
> ·         Lessons learned?
>  
> Regards,
>  
> John
>  
>  
> From: John Duffin <jduffin at uptimeinstitute.com>
> Date: Friday, 30 September 2016 at 10:21 AM
> To: "ausnog at lists.ausnog.net" <ausnog at lists.ausnog.net>
> Subject: Lessons Learned From SA Blackout
>  
> Hi,
>  
> I’m the MD for Uptime Institute in Australia.  After Superstorm Sandy in the US we published an article on the Lessons Learn by Operators of Critical Data Facilities.
>  
> https://journal.uptimeinstitute.com/disaster-recovery-lessons-learned-superstorm-sandy/
>  
> I’d like to do the same for the SA Blackout.  Its only when events like this happen that we really understand what’s going on. If anyone has something they would to share either in this forum or direct to jduffin at uptimeinstiute.com I’d be grateful.  In particular:
>  
> ·         What warnings do you get
> ·         What preparations did you make?
> ·         What went well?
> ·         What didn’t go well?
> ·         Lessons learned?
>  
> Regards,
>  
> John
>  
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog



More information about the AusNOG mailing list