[AusNOG] AusNOG Digest, Vol 5, Issue 8

carl gough [mobsource] carl at mobsource.com
Tue Jul 3 19:37:14 EST 2012


The problem for most carriers is that they have systemic network
blindness- forget about MTTR, mean time to resolution - its MTTI, mean
time to innocence now! A la "what you are seeing on here"
Ops Teams responsible for managing service delivery are inevitably going
to become all-powerful. While IT remains mission critical, but massively
complicated, we will see a new ³Age of Operations² emerge, and the tools
that help diagnose problems--and prove innocence along the way--are going
to be invaluable.

highly accurate visibility into both the network and the IT infrastructure
has become critical - especially for reputation and appropriate network
response times. 



[carl gough] founder and CEO  +61 425 266 764

mobsource.com  defined by benefits  not by technology



















On 3/07/12 10:42 AM, "ausnog-request at lists.ausnog.net"
<ausnog-request at lists.ausnog.net> wrote:

>Send AusNOG mailing list submissions to
>	ausnog at lists.ausnog.net
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://lists.ausnog.net/mailman/listinfo/ausnog
>or, via email, send a message with subject or body 'help' to
>	ausnog-request at lists.ausnog.net
>
>You can reach the person managing the list at
>	ausnog-owner at lists.ausnog.net
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of AusNOG digest..."
>
>
>Today's Topics:
>
>   1. Re: "All your router devices are belong to us" (Chris Hurley)
>   2. AAPT Ethernet outage (Art Cartwright)
>   3. Re: AAPT Ethernet outage (Joshua D'Alton)
>   4. Re: AAPT Ethernet outage (Bevan Slattery)
>   5. Re: AAPT Ethernet outage (Matt Perkins)
>   6. Re: AAPT Ethernet outage (Joshua D'Alton)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 03 Jul 2012 02:02:14 +1000
>From: Chris Hurley <chris at minopher.net.au>
>To: Ben Dale <bdale at comlinx.com.au>,	"ausnog at lists.ausnog.net"
>	<ausnog at lists.ausnog.net>
>Subject: Re: [AusNOG] "All your router devices are belong to us"
>Message-ID: <CC180326.28A12%chris at minopher.net.au>
>Content-Type: text/plain;	charset="US-ASCII"
>
>Yes interesting times lie ahead for all of us in IT.
>
>As providers we need to consider have we provided our clients with the
>'correct' information. What they then do with it is their business. I can
>see the lawyers having a field day, but in this day and age one has to
>cover
>ones own butt. :-(
>
>I miss the good old days of you build a go kart  line up down the hill -
>no
>brakes, no helmet, wheels held on by bent nails. If you tumble/hit a rock
>your problem rather than "the street wasn't smooth,
>
>I know of many organisations that have legal requirements in their make
>up/charters that information is either held locally or at least within
>Australia.
>
>Some even require data to be stored "on site" yes "on site" is a grey term
>but if your dealing with  Koori groups be very aware of their binding
>legal
>requirements. I only say Koori group as I have worked with a few so know
>first hand their requirements. Remember they have required TV movies/shows
>to post warnings "My contain pictures of dead people". Data = pictures  =
>websites = storage.
>
>Cisco in it's great push to the "cloud" is cutting across many
>groups/companies/organisations legal obligations.
>
>We need to inform the end users of what is happening.
>
>
>On 2/07/12 11:05 AM, "Ben Dale" <bdale at comlinx.com.au> wrote:
>
>> There are a few [1] other [2] vendors [3] that have been pushing the
>> cloud-based management of enterprise network devices for some time now.
>> Admittedly non of these guys have the pervasiveness of the big C, but
>>they are
>> popping up around the place, and if Cisco is pursuing this then I
>>imagine
>> customers must be asking for it.
>> 
>> The "all-in-the-cloud" message is pretty compelling for some enterprise
>>shops
>> whose core business is not IT, but for reasons already raised, the
>>off-shore
>> privacy and security implications remain largely unexplored and often
>>simply
>> ignored.
>> 
>> On the technical side though, the biggest issue I see is that the time
>>when
>> most enterprises need access to network management/monitoring the most
>>is when
>> something is down, and that something can just as easily be/is usually
>>"the
>> cloud".  The rest of the time, the blinking boxes in the cupboard just
>>work.
>> 
>> Interesting times ahead.
>> 
>> [1] http://www.meraki.com/
>> [2] http://www.aerohive.com
>> [3] http://www.thecloud.net/
>> 
>> 
>> On 30/06/2012, at 11:56 AM, Heinz N wrote:
>> 
>>> I just saw this on slashdot. Get the tin foil hats out.
>>> 
>>> 
>>>http://tech.slashdot.org/story/12/06/29/1425210/cisco-pushing-cloud-conn
>>>ect-r
>>> outer-firmware-allows-web-history-tracking
>>> 
>>> and
>>> 
>>> 
>>>http://www.reddit.com/r/technology/comments/vptu9/linksys_just_pushed_an
>>>d_ins
>>> talled_without_my
>>> 
>>> Seems CISCO is disallowing local admin to their low end home/SOHO
>>>routers.
>>> Admin can apparently now only be done through their cloud (since when
>>>does a
>>> cloud ever fail!!?)...... Their conditions also state that they can
>>>monitor
>>> your traffic as they wish (and the "patriot act" NSA, FBI etc etc). No
>>> telling what the bandwidth implications of this are: and who will pay
>>>for the
>>> extra unauthorised traffic?
>>> 
>>> You may want to rethink your equipment for SOHO clients.
>>> 
>>> The whole issue with Telstra tracking HTTP traffic is just the start.
>>>How
>>> long before your new "trusted computing" motherboard reflashes itself
>>>and
>>> starts reporting all your stuff to Redmond (or China).
>>> 
>>> I am happy to stick with my dumb bridged modem talking to a Linux
>>>router
>>> running iptables. Very cheap and with all the functionality of the most
>>> expensive routers and it doesn't report to some mothership cloud.
>>> 
>>> Heinz N.
>>> _______________________________________________
>>> AusNOG mailing list
>>> AusNOG at lists.ausnog.net
>>> http://lists.ausnog.net/mailman/listinfo/ausnog
>>> 
>> 
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>Regards,
>
>Chris Hurley BE (Elec), MBA
>Director
>
>
>******************************************************
> Minopher Pty Ltd     Phone: 1300 730 531
> 15 Nevana Street     Fax: +61-3-9763 3309
> Scoresby,  3179 Victoria
> Australia        
>******************************************************
>
>
>
>
>
>------------------------------
>
>Message: 2
>Date: Tue, 3 Jul 2012 08:33:19 +1000
>From: Art Cartwright <art.cartwright at aapt.com.au>
>To: ausnog at lists.ausnog.net
>Subject: [AusNOG] AAPT Ethernet outage
>Message-ID: <de72ffe610445a8d8ea82d980bb028d9 at mail.gmail.com>
>Content-Type: text/plain; charset="windows-1252"
>
>Hi, my name is Art and I run network operations at AAPT. I am new to the
>forum and I wanted to give everyone an update on the event that happened
>on
>Saturday in the AAPT network.
>
>
>
>On Saturday between the times of 12h00 and 14h30 AAPT experienced a large
>number of Ethernet switches (in both NSW and VIC) stop passing traffic and
>become unreachable.
>
>
>
>We know now that the problem was caused by a vendor?s equipment
>i*ncorrectly
>handling of the ?Leap Second Insertion? by **NTP**.*
>
>* *
>
>*At 15h00 we mobilized our on-call Field Operations staff who needed
>access
>various PoP?s in the CBD to power cycle the switches. We power cycled the
>first switch at 16h20 and *all services on the affected switch were
>restored immediately.
>
>
>
>We then mobilised more field operations staff as we knew we had to reboot
>all devices manually.
>
>
>
>*By 19h22 the m*ajority of customer services were confirmed restored and
>by
>01h15 99% of customer services were restored except for three sites where
>we had issues with site access.
>
>
>
>The vendor was able to simulate the issue in their lab in the early hours
>of Sunday morning and isolated it to the NTP ?leap second insertion?.
>
>
>
>I accept that during the event we did a poor job communicating with
>customers and the broader community at to what was happening. Our updates
>were infrequent and at times incorrect. This is something that we are
>looking at improving.
>
>
>
>I would welcome any suggestions as to the communication channels we should
>investigate.
>
>
>
>Thanks
>
>
>
>Art
>
>This communication, including any attachments, is confidential. If you
>are not the intended
>recipient, you should not read it - please contact me immediately,
>destroy it, and do not
>copy or use any part of this communication or disclose anything about it.
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://lists.ausnog.net/pipermail/ausnog/attachments/20120703/80b9c6d5/at
>tachment-0001.html>
>
>------------------------------
>
>Message: 3
>Date: Tue, 3 Jul 2012 08:46:36 +1000
>From: "Joshua D'Alton" <joshua at railgun.com.au>
>To: ausnog at lists.ausnog.net
>Subject: Re: [AusNOG] AAPT Ethernet outage
>Message-ID:
>	<CAMtDJDJRjQYmN=a3te2QXeG9QNgA-xmzzikgUNY8+o4L4ML5ag at mail.gmail.com>
>Content-Type: text/plain; charset="windows-1252"
>
>Welcome Art, suggestions:
>
>status.aapt.com.au   preferably on a non-aapt service, ie cloudflare,
>perhaps look how iinet does it (by no means perfect but certainly better
>than bigpond for example).
>twitter @aaptnoc or something
>this list :)
>
>On Tue, Jul 3, 2012 at 8:33 AM, Art Cartwright
><art.cartwright at aapt.com.au>wrote:
>
>> Hi, my name is Art and I run network operations at AAPT. I am new to the
>> forum and I wanted to give everyone an update on the event that
>>happened on
>> Saturday in the AAPT network.
>>
>>
>>
>> On Saturday between the times of 12h00 and 14h30 AAPT experienced a
>>large
>> number of Ethernet switches (in both NSW and VIC) stop passing traffic
>>and
>> become unreachable.
>>
>>
>>
>> We know now that the problem was caused by a vendor?s equipment
>>i*ncorrectly
>> handling of the ?Leap Second Insertion? by **NTP**.*
>>
>> * *
>>
>> *At 15h00 we mobilized our on-call Field Operations staff who needed
>> access various PoP?s in the CBD to power cycle the switches. We power
>> cycled the first switch at 16h20 and *all services on the affected
>>switch
>> were restored immediately.
>>
>>
>>
>> We then mobilised more field operations staff as we knew we had to
>>reboot
>> all devices manually.
>>
>>
>>
>> *By 19h22 the m*ajority of customer services were confirmed restored and
>> by 01h15 99% of customer services were restored except for three sites
>> where we had issues with site access.
>>
>>
>>
>> The vendor was able to simulate the issue in their lab in the early
>>hours
>> of Sunday morning and isolated it to the NTP ?leap second insertion?.
>>
>>
>>
>> I accept that during the event we did a poor job communicating with
>> customers and the broader community at to what was happening. Our
>>updates
>> were infrequent and at times incorrect. This is something that we are
>> looking at improving.
>>
>>
>>
>> I would welcome any suggestions as to the communication channels we
>>should
>> investigate.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Art
>>
>>
>>
>> This communication, including any attachments, is confidential. If you
>>are not the intended
>> recipient, you should not read it - please contact me immediately,
>>destroy it, and do not
>> copy or use any part of this communication or disclose anything about
>>it.
>>
>>
>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>>
>>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://lists.ausnog.net/pipermail/ausnog/attachments/20120703/9d40009e/at
>tachment-0001.html>
>
>------------------------------
>
>Message: 4
>Date: Mon, 2 Jul 2012 23:14:46 +0000
>From: Bevan Slattery <Bevan.Slattery at nextdc.com>
>To: Art Cartwright <art.cartwright at aapt.com.au>,
>	"ausnog at lists.ausnog.net"	<ausnog at lists.ausnog.net>
>Subject: Re: [AusNOG] AAPT Ethernet outage
>Message-ID: <CC186820.3417E%bevan.slattery at nextdc.com>
>Content-Type: text/plain; charset="windows-1252"
>
>Welcome Art.  Thanks for getting on and providing some feedback.  I think
>it's good for AAPT and the Ausnog list that you've joined.  I'm sure
>people will appreciate you coming on-list and take it a bit easy on you
>(at first:))
>
>Cheers
>
>[b]
>
>From: Art Cartwright
><art.cartwright at aapt.com.au<mailto:art.cartwright at aapt.com.au>>
>Date: Tuesday, 3 July 2012 8:33 AM
>To: "ausnog at lists.ausnog.net<mailto:ausnog at lists.ausnog.net>"
><ausnog at lists.ausnog.net<mailto:ausnog at lists.ausnog.net>>
>Subject: [AusNOG] AAPT Ethernet outage
>
>Hi, my name is Art and I run network operations at AAPT. I am new to the
>forum and I wanted to give everyone an update on the event that happened
>on Saturday in the AAPT network.
>
>On Saturday between the times of 12h00 and 14h30 AAPT experienced a large
>number of Ethernet switches (in both NSW and VIC) stop passing traffic
>and become unreachable.
>
>We know now that the problem was caused by a vendor?s equipment
>incorrectly handling of the ?Leap Second Insertion? by NTP.
>
>At 15h00 we mobilized our on-call Field Operations staff who needed
>access various PoP?s in the CBD to power cycle the switches. We power
>cycled the first switch at 16h20 and all services on the affected switch
>were restored immediately.
>
>We then mobilised more field operations staff as we knew we had to reboot
>all devices manually.
>
>By 19h22 the majority of customer services were confirmed restored and by
>01h15 99% of customer services were restored except for three sites where
>we had issues with site access.
>
>The vendor was able to simulate the issue in their lab in the early hours
>of Sunday morning and isolated it to the NTP ?leap second insertion?.
>
>I accept that during the event we did a poor job communicating with
>customers and the broader community at to what was happening. Our updates
>were infrequent and at times incorrect. This is something that we are
>looking at improving.
>
>I would welcome any suggestions as to the communication channels we
>should investigate.
>
>Thanks
>
>Art
>
>
>This communication, including any attachments, is confidential. If you
>are not the intended
>recipient, you should not read it - please contact me immediately,
>destroy it, and do not
>copy or use any part of this communication or disclose anything about it.
>
>
>
>
>The information contained in this email and any attachments may be
>confidential. This email and any attachments are also subject to
>copyright. No part of them may be reproduced, adapted or transmitted
>without the written permission of the copyright owner. If you are not the
>intended recipient, any use, interference with, disclosure or copying of
>this information is unauthorised and prohibited. If you have received
>this email in error, please immediately advise the sender by return email
>and delete the message from your system. All email communications to and
>from NEXTDC Limited are recorded for the purposes of archival and storage.
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://lists.ausnog.net/pipermail/ausnog/attachments/20120702/58f9c28b/at
>tachment-0001.html>
>
>------------------------------
>
>Message: 5
>Date: Tue, 03 Jul 2012 10:32:43 +1000
>From: Matt Perkins <matt at spectrum.com.au>
>To: art.cartwright at aapt.com.au
>Cc: ausnog at lists.ausnog.net
>Subject: Re: [AusNOG] AAPT Ethernet outage
>Message-ID: <4FF23DAB.1060007 at spectrum.com.au>
>Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
>Hi Art,
>  Firstly thanks for showing some balls by using your email address. You
>may live to regret it. Excuse my hard tone I have just come back from a
>meeting with one of my biggest customers explaining why they should not
>leave us. Due to your outage. Here's some suggestions of the bat.
>
>Frontier doesn't work. - When it's up (30 seconds to return a query is
>not up) the details on it are almost useless when there is a parent
>case. It only shows details from your case 90% of the time. Give us
>access to the parent case. What's the use of giving us the parent case
>number if we can only access it by waiting 30 minutes on hold. I have
>given up ringing most times as you end up with someone with poor
>communications skills who has very little technical understanding. We
>are wholesale customers don't ask us if we have reset the router.
>
>Twitter. - Twitter - Twitter - Twitter -Twitter and in case you missed
>it Twitter.
>Open a twitter account right now. Put it on your 3g phone and key in
>evey bit of info you have during a major outage at minimum 30 second
>intervals. Dont Lie. We will know. Just tell us the truth you will find
>it will be welcomed by your customers. We are wholesale customers. We
>understand problems happen remember we have customers screeching at us
>while you just put up a firewall at AAPT. Better bad news then no news.
>Have a look at https://twitter.com/#!/spectrumnet to see how it's done.
>
>Incident reports - A industry sudo standard today. You need a tick box
>in frontier please send me an incident report when this case is
>complete. This needs to include root cause and resolution as well as
>what will be done to stop a re-occurrence. Here's a hint. Dont blame the
>vendor. I cant blame you to my customers they dont care all they care
>about is they were down and how it wont happen again. Take charge of
>it.  Here's a free technical hint for your last outage. You need a hard
>power watchdog on the switch. A device that will hard reboot the power
>on your switch when it cant be seen for 5 minutes it needs to be in the
>pop and self contained.  Im sure you could afford them after all the
>money you saved on those non mainstream switches.
>
>Major outages - Any outage that effects more then 2 customers. How about
>a RVA (recorded voice announcement) while we are waiting on hold. We
>need the following information. The service types that are effect. The
>location that is effected. The estimated restoration time or time that
>more information will be forthcoming. This needs to be automated and
>should be the first job during a large scale outage. Yes even before
>starting to fix it.
>
>Management Systems - clearly there is a poor line of communications
>between your front line support and your back of house engineering. Well
>i hope that's all it is. If it's not then your engineering monitoring
>systems are substandard or your front of house are apathetic about
>customer support.  Let's go with a communication problem.  There needs
>to be lesion officers in both departments.  Engineers don't like
>communicating when they are under pressure. It's part of the personality
>type. A designated communication officer that works in the team can make
>this happen.
>
>Finally - Wholesale customers are usually knowledgeable in most cases
>they will know more about the systems than your front line support
>people. Dont assume they have the same skill set as retail customers.
>Telling someone presenting with 10,20 or 100 AAPT services all off the
>air to "reboot your router" is not helpfull.
>
>Matt.
>
>
>
>
>  On 3/07/12 8:33 AM, Art Cartwright wrote:
>>
>> Hi, my name is Art and I run network operations at AAPT. I am new to
>> the forum and I wanted to give everyone an update on the event that
>> happened on Saturday in the AAPT network.
>>
>> On Saturday between the times of 12h00 and 14h30 AAPT experienced a
>> large number of Ethernet switches (in both NSW and VIC) stop passing
>> traffic and become unreachable.
>>
>> We know now that the problem was caused by a vendor's equipment
>> i*ncorrectly handling of the "Leap Second Insertion" by **NTP**.*
>>
>> **
>>
>> *At 15h00 we mobilized our on-call Field Operations staff who needed
>> access various PoP's in the CBD to power cycle the switches. We power
>> cycled the first switch at 16h20 and *all services on the affected
>> switch were restored immediately.
>>
>> We then mobilised more field operations staff as we knew we had to
>> reboot all devices manually.
>>
>> *By 19h22 the m*ajority of customer services were confirmed restored
>> and by 01h15 99% of customer services were restored except for three
>> sites where we had issues with site access.
>>
>> The vendor was able to simulate the issue in their lab in the early
>> hours of Sunday morning and isolated it to the NTP "leap second
>> insertion".
>>
>> I accept that during the event we did a poor job communicating with
>> customers and the broader community at to what was happening. Our
>> updates were infrequent and at times incorrect. This is something that
>> we are looking at improving.
>>
>> I would welcome any suggestions as to the communication channels we
>> should investigate.
>>
>> Thanks
>>
>> Art
>>
>> This communication, including any attachments, is confidential. If you
>>are not the intended
>> recipient, you should not read it - please contact me immediately,
>>destroy it, and do not
>> copy or use any part of this communication or disclose anything about
>>it.
>>
>>
>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>>
>>
>
>
>-- 
>/* Matt Perkins
>         Direct 1300 137 379     Spectrum Networks Ptd. Ltd.
>         Office 1300 133 299     matt at spectrum.com.au
>         Fax    1300 133 255     Level 6, 350 George Street Sydney 2000
>         SIP 1300137379 at sip.spectrum.com.au
>         PGP/GNUPG Public Key can be found at  http://pgp.mit.edu
>*/
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://lists.ausnog.net/pipermail/ausnog/attachments/20120703/3c5400a1/at
>tachment-0001.html>
>
>------------------------------
>
>Message: 6
>Date: Tue, 3 Jul 2012 10:42:04 +1000
>From: "Joshua D'Alton" <joshua at railgun.com.au>
>To: ausnog at lists.ausnog.net
>Subject: Re: [AusNOG] AAPT Ethernet outage
>Message-ID:
>	<CAMtDJDLdpeHUOmZozuTdNHngCNQTtNTCOwjKi1QbNtQ0f81xWg at mail.gmail.com>
>Content-Type: text/plain; charset="windows-1252"
>
>server$ sudo incident_report generate
>
>is that how its done? :D  all good points, do you have a blog?
>
>On Tue, Jul 3, 2012 at 10:32 AM, Matt Perkins <matt at spectrum.com.au>
>wrote:
>
>>  Hi Art,
>>  Firstly thanks for showing some balls by using your email address. You
>> may live to regret it. Excuse my hard tone I have just come back from a
>> meeting with one of my biggest customers explaining why they should not
>> leave us. Due to your outage. Here's some suggestions of the bat.
>>
>> Frontier doesn't work. - When it's up (30 seconds to return a query is
>>not
>> up) the details on it are almost useless when there is a parent case. It
>> only shows details from your case 90% of the time. Give us access to the
>> parent case. What's the use of giving us the parent case number if we
>>can
>> only access it by waiting 30 minutes on hold. I have given up ringing
>>most
>> times as you end up with someone with poor communications skills who has
>> very little technical understanding. We are wholesale customers don't
>>ask
>> us if we have reset the router.
>>
>> Twitter. - Twitter - Twitter - Twitter -Twitter and in case you missed
>>it
>> Twitter.
>> Open a twitter account right now. Put it on your 3g phone and key in
>>evey
>> bit of info you have during a major outage at minimum 30 second
>>intervals.
>> Dont Lie. We will know. Just tell us the truth you will find it will be
>> welcomed by your customers. We are wholesale customers. We understand
>> problems happen remember we have customers screeching at us while you
>>just
>> put up a firewall at AAPT. Better bad news then no news.  Have a look at
>> https://twitter.com/#!/spectrumnet to see how it's done.
>>
>> Incident reports - A industry sudo standard today. You need a tick box
>>in
>> frontier please send me an incident report when this case is complete.
>>This
>> needs to include root cause and resolution as well as what will be done
>>to
>> stop a re-occurrence. Here's a hint. Dont blame the vendor. I cant blame
>> you to my customers they dont care all they care about is they were down
>> and how it wont happen again. Take charge of it.  Here's a free
>>technical
>> hint for your last outage. You need a hard power watchdog on the
>>switch. A
>> device that will hard reboot the power on your switch when it cant be
>>seen
>> for 5 minutes it needs to be in the pop and self contained.  Im sure you
>> could afford them after all the money you saved on those non mainstream
>> switches.
>>
>> Major outages - Any outage that effects more then 2 customers. How
>>about a
>> RVA (recorded voice announcement) while we are waiting on hold. We need
>>the
>> following information. The service types that are effect. The location
>>that
>> is effected. The estimated restoration time or time that more
>>information
>> will be forthcoming. This needs to be automated and should be the first
>>job
>> during a large scale outage. Yes even before starting to fix it.
>>
>> Management Systems - clearly there is a poor line of communications
>> between your front line support and your back of house engineering.
>>Well i
>> hope that's all it is. If it's not then your engineering monitoring
>>systems
>> are substandard or your front of house are apathetic about customer
>> support.  Let's go with a communication problem.  There needs to be
>>lesion
>> officers in both departments.  Engineers don't like communicating when
>>they
>> are under pressure. It's part of the personality type. A designated
>> communication officer that works in the team can make this happen.
>>
>> Finally - Wholesale customers are usually knowledgeable in most cases
>> they will know more about the systems than your front line support
>>people.
>> Dont assume they have the same skill set as retail customers. Telling
>> someone presenting with 10,20 or 100 AAPT services all off the air to
>> "reboot your router" is not helpfull.
>>
>> Matt.
>>
>>
>>
>>
>>
>>  On 3/07/12 8:33 AM, Art Cartwright wrote:
>>
>>  Hi, my name is Art and I run network operations at AAPT. I am new to
>>the
>> forum and I wanted to give everyone an update on the event that
>>happened on
>> Saturday in the AAPT network.
>>
>>
>>
>> On Saturday between the times of 12h00 and 14h30 AAPT experienced a
>>large
>> number of Ethernet switches (in both NSW and VIC) stop passing traffic
>>and
>> become unreachable.
>>
>>
>>
>> We know now that the problem was caused by a vendor?s equipment
>>i*ncorrectly
>> handling of the ?Leap Second Insertion? by **NTP**.*
>>
>> * *
>>
>> *At 15h00 we mobilized our on-call Field Operations staff who needed
>> access various PoP?s in the CBD to power cycle the switches. We power
>> cycled the first switch at 16h20 and *all services on the affected
>>switch
>> were restored immediately.
>>
>>
>>
>> We then mobilised more field operations staff as we knew we had to
>>reboot
>> all devices manually.
>>
>>
>>
>> *By 19h22 the m*ajority of customer services were confirmed restored and
>> by 01h15 99% of customer services were restored except for three sites
>> where we had issues with site access.
>>
>>
>>
>> The vendor was able to simulate the issue in their lab in the early
>>hours
>> of Sunday morning and isolated it to the NTP ?leap second insertion?.
>>
>>
>>
>> I accept that during the event we did a poor job communicating with
>> customers and the broader community at to what was happening. Our
>>updates
>> were infrequent and at times incorrect. This is something that we are
>> looking at improving.
>>
>>
>>
>> I would welcome any suggestions as to the communication channels we
>>should
>> investigate.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Art
>>
>>
>>
>> This communication, including any attachments, is confidential. If you
>>are not the intended
>> recipient, you should not read it - please contact me immediately,
>>destroy it, and do not
>> copy or use any part of this communication or disclose anything about
>>it.
>>
>>
>>
>> _______________________________________________
>> AusNOG mailing 
>>listAusNOG at lists.ausnog.nethttp://lists.ausnog.net/mailman/listinfo/ausno
>>g
>>
>>
>>
>> --
>> /* Matt Perkins
>>         Direct 1300 137 379     Spectrum Networks Ptd. Ltd.
>>         Office 1300 133 299     matt at spectrum.com.au
>>         Fax    1300 133 255     Level 6, 350 George Street Sydney 2000
>>         SIP 1300137379 at sip.spectrum.com.au
>>         PGP/GNUPG Public Key can be found at  http://pgp.mit.edu
>> */
>>
>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>>
>>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: 
><http://lists.ausnog.net/pipermail/ausnog/attachments/20120703/cf906e72/at
>tachment.html>
>
>------------------------------
>
>_______________________________________________
>AusNOG mailing list
>AusNOG at lists.ausnog.net
>http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
>End of AusNOG Digest, Vol 5, Issue 8
>************************************





More information about the AusNOG mailing list