[AusNOG] Optus downtime chat + affecting SMS verification toTelstra?
Luke Thompson
luke.t at tncrew.com.au
Fri Nov 17 14:39:36 AEDT 2023
I'd muse they pay enough that there's an agreement made to wear that.
Once it's blown over, it's just another outage blip in the past.
They do happen; no person nor network is infallible.
As Ben highlights though, Optus seems rough.
Luke
On 17/11/2023 10:31 am, Andrew Oakeley wrote:
> And in the senate enquiry this morning they both blamed Cisco
>
> "The trigger was the Singtel outage, but the root cause was Cisco."
>
> https://www.abc.net.au/news/2023-11-17/asx-markets-business-live-news-optus-outage-senate-inquiry/103115518
>
> -----Original Message-----
> From: AusNOG <ausnog-bounces at lists.ausnog.net> On Behalf Of DaZZa
> Sent: Friday, November 17, 2023 8:15 AM
> To: Luke Thompson <luke.t at tncrew.com.au>
> Cc: michael.bethune at australiaonline.au; ausnog at lists.ausnog.net
> Subject: Re: [AusNOG] Optus downtime chat + affecting SMS verification toTelstra?
>
> And now Singtel have returned serve and are denying it was them.
>
> https://www.zdnet.com/article/singtel-refutes-reports-that-its-system-upgrade-caused-optus-outage/
>
> It's like watching kids trying to blame each other for who broke the window with the cricket ball.
>
> D
>
> On Wed, 15 Nov 2023 at 11:01, Luke Thompson <luke.t at tncrew.com.au> wrote:
>> They've blamed Singtel Internet Exchange (STiX) for the international peering route updates, at least going by anonymous sources cited by SMH.
>>
>> https://www.smh.com.au/technology/identity-of-third-party-who-brought-
>> down-optus-network-revealed-20231114-p5ejy1.html
>>
>> Luke
>>
>> On 14 November 2023 12:37:30 pm Ben Buxton <bb.ausnog at bb.cactii.net> wrote:
>>>
>>> Blaming routing updates from peers is a scapegoat and never is the cause of an outage - public BGP is the wild west and you're always getting broken information - it's your responsibility to filter those updates and (unless it's a zero-day poison packet bug) you only have yourself to blame if you fall over from them.
>>>
>>> If I were an optus business customer, reading that outage page would just make me even more determined to move elsewhere.
>>>
>>> They vaguely categorised the "what" of the outage into a big bucket (software upgrade related), but gave absolutely no useful information or explain the "why" which would regain my confidence.
>>>
>>> Why did this upgrade trigger an outage?
>>> - Was there a behaviour/feature change they neglected to take into account?
>>> - Did the upgrade require a config change that broke?
>>> - Were they neglectful in following config best practices? (filtering, prefix limits, restarts, etc?)
>>> - Did the new software have an unidentified bug?
>>> - Why did testing not catch this problem (they do test changes...right?)
>>> - How did progressive rollout still lead to this impact? (they do
>>> progressive rollouts over N days/weeks...right?)
>>>
>>> Why did mitigation take so long?
>>> - What detection/telemetry measures led them to realise the scope of the outage? (news reports dont count)
>>> - Were they dependent on the downed network for oncall paging & comms?
>>> - Why did their rollback plan fail? (they had a rollback plan...right?)
>>> - Why was remote console/power access not working? (they have both...right?)
>>> - Were they dependent on the downed network for said access?
>>> - Were their playbooks/credential access dependent on the downed network?
>>>
>>> "We have made changes to the network to address this issue so that it cannot occur again." ... this smells like "whoops forgot to set max-prefix (with restart!)".
>>>
>>> Bugs, config stuff-ups, etc happen, and they will continue to happen - it is a lie to state that outages will never happen again. This is the culmination of monumental failures in the trigger, prevention and mitigation measures which cannot be fixed in a couple of days, it sounds like much deeper architectural and organisational issues need addressing.
>>>
>>> Many of the above failures are things that a young network will experience and learn from, but for Optus these should all be well planned for already.
>>>
>>> I suspect any government investigation will simply add more bureaucracy and boxes to tick rather than effect meaningful change, but one can always be hopeful...
>>>
>>> BB
>>>
>>> On Tue, 14 Nov 2023 at 13:02, Michael Bethune <mike at ozonline.com.au> wrote:
>>>> "Optus network received changes to routing information from an
>>>> international peering network following a software upgrade"
>>>>
>>>> I note they are very careful to avoid nominating whose software upgrade.
>>>>
>>>> I also note that when they say they received routing updates, don't
>>>> they limit the number of prefixes accepted by their BGP from any
>>>> given peer?
>>>>
>>>> Sounds like a carefully crafted statement to enable them to point
>>>> fingers elsewhere, not unexpected.
>>>>
>>>> - Michael.
>>>>
>>>> Quoting francisfides at mailup.net:
>>>>
>>>>> Looks like it was a software upgrade:
>>>>> https://www.abc.net.au/news/2023-11-13/optus-identifies-cause-of-n
>>>>> ationwide-outage-software-upgrade/103099902
>>>>>
>>>>> Nothing in their media centre, just appears as a new box on their
>>>>> outage response page:
>>>>> https://www.optus.com.au/notices/outage-response
>>>>>
>>>>> Cheers
>>>>>
>>>>> ----
>>>>> Text:
>>>>>
>>>>> "We have been working to understand what caused the outage on
>>>>> Wednesday, and we now know what the cause was and have taken steps
>>>>> to ensure it will not happen again. We apologise sincerely for
>>>>> letting our customers down and the inconvenience it caused.
>>>>>
>>>>> At around 4.05am Wednesday morning, the Optus network received
>>>>> changes to routing information from an international peering
>>>>> network following a software upgrade. These routing information
>>>>> changes propagated through multiple layers in our network and
>>>>> exceeded preset safety levels on key routers. This resulted in
>>>>> those routers disconnecting from the Optus IP Core network to protect themselves.
>>>>>
>>>>> The restoration required a large-scale effort of the team and in
>>>>> some cases required Optus to reconnect or reboot routers
>>>>> physically, requiring the dispatch of people across a number of
>>>>> sites in Australia. This is why restoration was progressive over the afternoon.
>>>>>
>>>>> Given the widespread impact of the outage, our investigations into
>>>>> the issue took longer than we would have liked as we examined
>>>>> several different paths to restoration. The restoration of the
>>>>> network was at all times our priority and we subsequently
>>>>> established the cause working together with our partners. We have
>>>>> made changes to the network to address this issue so that it
>>>>> cannot occur again.
>>>>>
>>>>> We are committed to learning from what has occurred and continuing
>>>>> to work with our international vendors and partners to increase
>>>>> the resilience of our network. We will also support and fully
>>>>> cooperate with the reviews being undertaken by the Government and the Senate.
>>>>>
>>>>> We continue to invest heavily to improve the resiliency of our
>>>>> network and services."
>>>>>
>>>>> --
>>>>>
>>>>> francisfides at mailup.net
>>>>>
>>>>> On Thu, Nov 9, 2023, at 07:15, DaZZa wrote:
>>>>>> I have all three you're asking about.
>>>>>>
>>>>>> But I'm very small potatoes compared to most of the members of
>>>>>> this list, and my required remote footprint is correspondingly
>>>>>> small, so it's easy to maintain.
>>>>>>
>>>>>> D
>>>>>>
>>>>>> On Thu, 9 Nov 2023 at 06:18, Phillip Grasso
>>>>>> <phillip.grasso at gmail.com> wrote:
>>>>>>>> I mean come on, it's nearly 2024 and a [major] telco does not
>>>>>>>> have remote console access?
>>>>>>>
>>>>>>> If we send a poll out to this community, how many would be able
>>>>>>> to genuinely honestly answer:
>>>>>>>
>>>>>>> Do you have a console or appropriate control plane access into
>>>>>>> all your critical infrastructure?
>>>>>>> Do you have independant out of band that does not share any
>>>>>>> infrastructure with your current system(s) - with exemption for
>>>>>>> physical location and power.
>>>>>>> Do you have the ability to remote power control your devices?
>>>>>>>
>>>>>>> We know from the facebook outage in 2021 that they probably
>>>>>>> didn't have the above, so its not entirely uncommon for folks
>>>>>>> to have *proper independant* console and remote access.
>>>>>>>
>>>>>>>
>>>>>>> I empathize with the Optus team and their customers who have
>>>>>>> been negatively impacted by this incident. I sincerely hope that
>>>>>>> some positive outcomes can emerge from this situation, including:
>>>>>>>
>>>>>>> - Attention to critical infrastructure resilience
>>>>>>> - BGP clue increases
>>>>>>> - Incident management improves
>>>>>>> (I'm sure there's more).
>>>>>>>
>>>>>>> Network is a black box to most people and I think a large chunk
>>>>>>> of Australia now knows what it feels like to not have it.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 8 Nov 2023 at 11:06, Ben Buxton <bb.ausnog at bb.cactii.net> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 8 Nov 2023 at 10:14, DaZZa <dazzagibbs at gmail.com> wrote:
>>>>>>>>> Yeah, I'd be willing to bet that it's a change which wasn't
>>>>>>>>> thoroughly tested before being rolled out, and which had an
>>>>>>>>> inadequate backout plan.
>>>>>>>>
>>>>>>>> Also, "Our on-site technician is actively prioritising
>>>>>>>> establishing a console connection.".
>>>>>>>>
>>>>>>>> I mean come on, it's nearly 2024 and a [major] telco does not
>>>>>>>> have remote console access? Whilst I'm looking forward to
>>>>>>>> enthusiastically reading the PM, I'll have to book a physio
>>>>>>>> appointment in advance due to neck strain from all the head
>>>>>>>> shaking it'll likely induce.
>>>>>>>>
>>>>>>>> BB
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Interestingly, my Optus mobile actually had a valid connection
>>>>>>>>> for a short time - wasn't able to actually DO anything, but
>>>>>>>>> was connected to the OPtus network - but it's now gone to "SOS" mode.
>>>>>>>>>
>>>>>>>>> D
>>>>>>>>>
>>>>>>>>> On Wed, 8 Nov 2023 at 10:01, John Edwards <jaedwards at gmail.com> wrote:
>>>>>>>>>> The 4am Wednesday morning outage start looks suspiciously
>>>>>>>>>> like
>>>>>>>>> a firmware upgrade window.
>>>>>>>>>> I note that Optus devices where I am are showing "SoS" which
>>>>>>>>> indicates the tower is unable to reach the location register,
>>>>>>>>> which presumably is on a private network and indicative of a
>>>>>>>>> pretty major fault rather than just IP.
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, 8 Nov 2023 at 09:10, DaZZa <dazzagibbs at gmail.com> wrote:
>>>>>>>>>>> The Optus hamster finally died of old age.
>>>>>>>>>>>
>>>>>>>>>>> I would suggest your SMS issues would be caused by whoever
>>>>>>>>>>> is issuing the SMS using Optus - not so much by the Telstra end receiving it.
>>>>>>>>>>>
>>>>>>>>>>> Anecdotally, Optus enterprise/wholesale appears to be still
>>>>>>>>>>> functional
>>>>>>>>>>> - at least my link appears to be working fine - and my BGP
>>>>>>>>>>> advertisements are still being seen overseas - seems to be
>>>>>>>>>>> only NBN and mobile based services which are busted
>>>>>>>>>>>
>>>>>>>>>>> D
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 8 Nov 2023 at 09:27, <francisfides at mailup.net> wrote:
>>>>>>>>>>>> Morning all,
>>>>>>>>>>>> Hope the chaos isn't too hard on your work/family.
>>>>>>>>>>>> I have had trouble with a couple of SMS verifications
>>>>>>>>> coming through to me, my Telstra number. Is this related?
>>>>>>>>>>>> Any general banter around the downtime would be fine too
>>>>>>>>>>>> -
>>>>>>>>> looks like it all began at 4.07am AEDT?
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> francisfides at mailup.net
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> AusNOG mailing list
>>>>>>>>>>>> AusNOG at lists.ausnog.net
>>>>>>>>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> veg·e·tar·i·an:
>>>>>>>>>>> Ancient tribal slang for the village idiot who can't hunt,
>>>>>>>>> fish or ride
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> AusNOG mailing list
>>>>>>>>>>> AusNOG at lists.ausnog.net
>>>>>>>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> veg·e·tar·i·an:
>>>>>>>>> Ancient tribal slang for the village idiot who can't hunt,
>>>>>>>>> fish or ride _______________________________________________
>>>>>>>>> AusNOG mailing list
>>>>>>>>> AusNOG at lists.ausnog.net
>>>>>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>>>>> _______________________________________________
>>>>>>>> AusNOG mailing list
>>>>>>>> AusNOG at lists.ausnog.net
>>>>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>>>
>>>>>>
>>>>>> --
>>>>>> veg·e·tar·i·an:
>>>>>> Ancient tribal slang for the village idiot who can't hunt, fish
>>>>>> or ride _______________________________________________
>>>>>> AusNOG mailing list
>>>>>> AusNOG at lists.ausnog.net
>>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>> _______________________________________________
>>>>> AusNOG mailing list
>>>>> AusNOG at lists.ausnog.net
>>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AusNOG mailing list
>>>> AusNOG at lists.ausnog.net
>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>> _______________________________________________
>>> AusNOG mailing list
>>> AusNOG at lists.ausnog.net
>>> https://lists.ausnog.net/mailman/listinfo/ausnog
>>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> https://lists.ausnog.net/mailman/listinfo/ausnog
>
>
> --
> veg·e·tar·i·an:
> Ancient tribal slang for the village idiot who can't hunt, fish or ride _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> https://lists.ausnog.net/mailman/listinfo/ausnog
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> https://lists.ausnog.net/mailman/listinfo/ausnog
More information about the AusNOG
mailing list