[AusNOG] Office Link Needed (Fibre or alike) Sydney

Christopher Pollock cpollock at twitch.tv
Fri Oct 23 04:31:03 EST 2015


I wrote this down so as to not forget it, about my visit to 30 Ross.  I
call it "Reason For Outage: Wolverine"

There are a lot of public datacentres that people use in Australia. Most of
them have an affilliation or ownership in some way with a major telco,
because what better way to connect with customers than to make it easy and
available?

It was a normal day, started like any other. Walked through the office,
grateful that there was no water anywhere. Checked the airconditioning
temperatures, and nothing was blowing up. It looked like the start of a
good day.

The NSW IX was in Sydney, nearly 1000 km from me, but the state itself was
unmanned. Nothing /really/ ever went wrong there so we made do. Until today.

I noticed an alert for a customer being offline in the public colo DC.
Unusual, but not unheard of. Customers reboot routers all the time. When it
didn’t come back, I decided they had disconnected the session for a reason,
and sent them an email ‘hey do you know your peering is offline?’. They
wrote back saying they thought it was us, so I checked the switch they were
connected to, and sure enough the port was down. Hmm. Okay, probably a
cabling fault or dislodged cable. I’d schedule in a time to go down and
check it out, when I had more work lined up. Then another one droppped off.
And a third. By the time we were five peers down we were in full panic
station mode. Was our switch dying? God I hope not.

15 minutes later I was on a train to the airport carrying only what I had
on me, hopped on a plane, and two hours later I was in Sydney. By the time
I got there, another 10 peers had dropped offline and the phones were
running hot. I turned mine off so as to stop getting unhelpful calls and
diverted it all back to reception.

Now, to explain a little about how public datacentres often work, generally
the colo provider would charge you an exorbinant amount to install cabling
between racks or to run patch leads, in the thousands. However, anyone with
a carrier license & cabling license and the right tools could run up their
own in 15 minutes. This happened many times. Thousands of times. I would
not be underestimating it to say that there were at least 5,000
unregulated, unregistered cables in that datacentre floor.

When I finally rocked up to the DC, 15 of our 20 or so peers were offline.
I ran over to our rack and checked the switch. It was fine, no errors. I
ran TDR testing on the ports to check for cable lengths, connectivity,
shorts, any kind of Layer 1 or 2 problem. All the cables registered as an
open pair, meaning they were not connected at the other end. This was
thoroughly confusing. So I checked the actual lengths on these TDR traces
and they were actually showing as only 15m away. What the hell? Most of
these cable runs with 50 - 80m - why did they stop at 15m?

I walked out to about 15m and walked a circumference around the rack. When
I rounded the corner, the blood drained from my face (as it so often does
in these situations). I knew exactly what had happened.

A new tech for the colo provider was not aware of a little thing called the
Telecommunications Act which allows you to run these kinds of cables. So
he’d gone through all the locally-paid patches, which were done in a
specific colour, and figured out that anything not bright yellow must have
been ‘illegal’. He had four floor tiles removed, and was standing over the
cable pits, dual-weilding side-cutters, one in each hand. Cutting anything
the wrong colour, like a boxer pounding away with left-right combos over
and over. Slashing away at our infrastructure like Wolverine berzerker
style. There was a pile of cables next to him that, I shit you not, was the
size of a small car.

Yelling and sprinting over, I demanded that he stop what he was doing. I
was about to say ‘..and put them back the way they were when I realised he
must have been at it for 6+ hours and reconnecting them all was going to be
impossible. He’d destroyed the infrastructure for god knows how many
businesses. Now, I’m pretty calm most of the time, even in the face of
danger, but this .. this made me lose my shit.

Me: WHAT ARE YOU DOING STOP
DC Tech: I’m removing the inactive and unauthorised patches. I have an
order from management to do it.
Me: ARE YOU A F**KING IDIOT? DO YOU REALISE THAT THESE ARE ACTIVE
TELECOMMUNICATIONS SERVICES AND THAT INTERFERING WITH OR DISCONNECTING THEM
IS A FEDERAL OFFENSE UNDER THE TELECOMMUNICATIONS ACT 1997!? YOU CAN GO TO
FUCKING JAIL FOR 10 YEARS FOR ONE AND YOU’VE DONE LIKE TWO HUNDRED NOW STOP
BEFORE YOU RUIN ANYONE ELSE’S BUSINESS

Finally, it was someone else’s face going pale. He agreed to stop, and I
ran to the nearest supply store, bought a few boxes of cables and supplies
and set to furiously running new cables to all our customers. He helped me
re-run cabling for all of our customers, and within maybe two hours they
were all back online.

We sent the colo provider an invoice for the expenses incurred during
troubleshooting / rectification and they grudgingly agreed to pay for it.

I still can’t get the image of that giant ball of cables out of my head. It
was a horribly hybrid of a giant aborted fetus and an ugly medusa,
thousands of RJ45 heads pointing in all directions.

Heading back to the airport, I sat with my head in my hands, regretting a
lost day’s work, and trying to figure out how I would word this Post
Incident Report.

Reason for outage: AAPT is the worst.

Eventually I handed the PIR job over to someone else, as I’d long since
lost the ability to be civil about it, there was only one thing left to do.

Go to the pub, and cleanse the day with purifying beer.


--
Christopher Pollock  |   *Twitch.tv <http://Twitch.tv>*
<http://www.twitch.tv>  |   Network Development Engineer   |   415-361-3042
  |   Skype–christopherpollock   |   Twitter–chhopsky

*Twitch in the News:*
CBS News
<http://www.cbsnews.com/8301-501465_162-57369949-501465/play-video-games-for-a-living-twitchtv-is-making-it-happen/>
   "For a video game lover, like myself, the site is addictive. "

On Thu, Oct 22, 2015 at 1:28 AM, Purdon, Bob <bobp at purdon.id.au> wrote:

> On 22 October 2015 at 13:54, Bevan Slattery <bevan at slattery.net.au> wrote:
>
>> OMG.  Only to be topped by 530 Collins.  I remember PIPE taking over the
>> Ausbone rack there.  I remember Steve or Bob saying they couldn't get the
>> floor tiles to sit on the raised structure meaning the tiles were
>> effectively "floating" on the cables :). The cabling in that rack was
>> possibly the worst in living memory.
>>
>
> I know I said that once, and Steve may well have also.  IIRC, the raised
> floor was 300mm above the slab and the cables were pressed up against the
> underside of the floor tiles.  I've probably got a photo somewhere.  Had to
> stand and/or gently jump on one particular tile to get it to sit down on
> the stringers again.
>
> In a move that obviously went against the convention at that site, I did
> in fact remove quite a few cables as part of cleaning that rack up, which
> in turn helped that tile fit just a little bit better.
>
> I also recall running a cable or two in there before I was at PIPE and
> despite there being trays (which I actually used), the general cable
> routing convention seemed to be just go point A to point B, and if that
> means going diagonally across the room then that's what happened.
>
> Wonder how many tonnes of cable there were under that floor? :-)
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20151022/a5533aee/attachment.html>


More information about the AusNOG mailing list