server$ sudo incident_report generate<div><br></div><div>is that how its done? :D all good points, do you have a blog?<br><div><br><div class="gmail_quote">On Tue, Jul 3, 2012 at 10:32 AM, Matt Perkins <span dir="ltr"><<a href="mailto:matt@spectrum.com.au" target="_blank">matt@spectrum.com.au</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Hi Art,<br>
Firstly thanks for showing some balls by using your email
address. You may live to regret it. Excuse my hard tone I have
just come back from a meeting with one of my biggest customers
explaining why they should not leave us. Due to your outage.
Here's some suggestions of the bat. <br>
<br>
Frontier doesn't work. - When it's up (30 seconds to return a
query is not up) the details on it are almost useless when there
is a parent case. It only shows details from your case 90% of the
time. Give us access to the parent case. What's the use of giving
us the parent case number if we can only access it by waiting 30
minutes on hold. I have given up ringing most times as you end up
with someone with poor communications skills who has very little
technical understanding. We are wholesale customers don't ask us
if we have reset the router. <br>
<br>
Twitter. - Twitter - Twitter - Twitter -Twitter and in case you
missed it Twitter. <br>
Open a twitter account right now. Put it on your 3g phone and key
in evey bit of info you have during a major outage at minimum 30
second intervals. Dont Lie. We will know. Just tell us the truth
you will find it will be welcomed by your customers. We are
wholesale customers. We understand problems happen remember we
have customers screeching at us while you just put up a firewall
at AAPT. Better bad news then no news. Have a look at
<a href="https://twitter.com/#!/spectrumnet" target="_blank">https://twitter.com/#!/spectrumnet</a> to see how it's done. <br>
<br>
Incident reports - A industry sudo standard today. You need a tick
box in frontier please send me an incident report when this case
is complete. This needs to include root cause and resolution as
well as what will be done to stop a re-occurrence. Here's a hint.
Dont blame the vendor. I cant blame you to my customers they dont
care all they care about is they were down and how it wont happen
again. Take charge of it. Here's a free technical hint for your
last outage. You need a hard power watchdog on the switch. A
device that will hard reboot the power on your switch when it cant
be seen for 5 minutes it needs to be in the pop and self
contained. Im sure you could afford them after all the money you
saved on those non mainstream switches. <br>
<br>
Major outages - Any outage that effects more then 2 customers. How
about a RVA (recorded voice announcement) while we are waiting on
hold. We need the following information. The service types that
are effect. The location that is effected. The estimated
restoration time or time that more information will be
forthcoming. This needs to be automated and should be the first
job during a large scale outage. Yes even before starting to fix
it. <br>
<br>
Management Systems - clearly there is a poor line of
communications between your front line support and your back of
house engineering. Well i hope that's all it is. If it's not then
your engineering monitoring systems are substandard or your front
of house are apathetic about customer support. Let's go with a
communication problem. There needs to be lesion officers in both
departments. Engineers don't like communicating when they are
under pressure. It's part of the personality type. A designated
communication officer that works in the team can make this happen.
<br>
<br>
Finally - Wholesale customers are usually knowledgeable in most
cases they will know more about the systems than your front line
support people. Dont assume they have the same skill set as retail
customers. Telling someone presenting with 10,20 or 100 AAPT
services all off the air to "reboot your router" is not helpfull.
<br>
<br>
Matt.<div><div class="h5"><br>
<br>
<br>
<br>
<br>
On 3/07/12 8:33 AM, Art Cartwright wrote:<br>
</div></div></div>
<blockquote type="cite"><div><div class="h5">
<div>
<p class="MsoNormal">Hi, my name is Art and I run network
operations at AAPT. I am new to the forum and I wanted to give
everyone an update on the event that happened on Saturday in
the AAPT network.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">On Saturday between the times of 12h00 and
14h30 AAPT experienced a large number of Ethernet switches (in
both NSW and VIC) stop passing traffic and become unreachable.
</p>
<p class="MsoNormal">
</p>
<p class="MsoNormal">We know now that the problem was caused by
a vendor’s equipment i<strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";font-weight:normal">ncorrectly
handling of the “Leap Second Insertion” by </span></strong><strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif"">NTP</span></strong><strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";font-weight:normal">.</span></strong></p>
<p class="MsoNormal"><strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";font-weight:normal"> </span></strong></p>
<p class="MsoNormal"><strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";font-weight:normal">At
15h00 we mobilized our on-call Field Operations staff who
needed access various PoP’s in the CBD to power cycle the
switches. We power cycled the first switch at 16h20 and </span></strong><span style>all services on the affected switch were
restored immediately.</span></p>
<p class="MsoNormal"><span style> </span></p>
<p class="MsoNormal"><span style>We then mobilised
more field operations staff as we knew we had to reboot all
devices manually.</span></p>
<p class="MsoNormal">
<span style> </span></p>
<p class="MsoNormal"><strong><span style="font-size:12.0pt;font-family:"Calibri","sans-serif";font-weight:normal">By
19h22 the m</span></strong><span style>ajority
of customer services were confirmed restored and by 01h15
99% of customer services were restored except for three
sites where we had issues with site access. </span></p>
<p class="MsoNormal"><span style> </span></p>
<p class="MsoNormal"><span style>The vendor was
able to simulate the issue in their lab in the early hours
of Sunday morning and isolated it to the NTP “leap second
insertion”.</span></p>
<p class="MsoNormal"><span style> </span></p>
<p class="MsoNormal">I accept that during the event we did a
poor job communicating with customers and the broader
community at to what was happening. Our updates were
infrequent and at times incorrect. This is something that we
are looking at improving.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I would welcome any suggestions as to the
communication channels we should investigate.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Thanks</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Art</p>
<p class="MsoNormal"> </p>
</div>
</div></div><pre><div><div class="h5">This communication, including any attachments, is confidential. If you are not the intended
recipient, you should not read it - please contact me immediately, destroy it, and do not
copy or use any part of this communication or disclose anything about it.
<fieldset></fieldset>
</div></div><div class="im"><pre>_______________________________________________
AusNOG mailing list
<a href="mailto:AusNOG@lists.ausnog.net" target="_blank">AusNOG@lists.ausnog.net</a>
<a href="http://lists.ausnog.net/mailman/listinfo/ausnog" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a>
</pre>
</div></pre><span class="HOEnZb"><font color="#888888">
</font></span></blockquote><span class="HOEnZb"><font color="#888888">
<br>
<br>
<pre cols="72">--
/* Matt Perkins
Direct 1300 137 379 Spectrum Networks Ptd. Ltd.
Office 1300 133 299 <a href="mailto:matt@spectrum.com.au" target="_blank">matt@spectrum.com.au</a>
Fax 1300 133 255 Level 6, 350 George Street Sydney 2000
SIP <a href="mailto:1300137379@sip.spectrum.com.au" target="_blank">1300137379@sip.spectrum.com.au</a>
PGP/GNUPG Public Key can be found at <a href="http://pgp.mit.edu" target="_blank">http://pgp.mit.edu</a>
*/
</pre>
</font></span></div>
<br>_______________________________________________<br>
AusNOG mailing list<br>
<a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>
<a href="http://lists.ausnog.net/mailman/listinfo/ausnog" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a><br>
<br></blockquote></div><br></div></div>