[AusNOG] Optus downtime chat + affecting SMS verification toTelstra?
DaZZa
dazzagibbs at gmail.com
Fri Nov 17 12:37:39 AEDT 2023
What a load of crap.
The root cause was they're morons, and configured the routers incorrectly.
Cisco had nothing to do with it. I'll bet the routers behaved exactly
as they were intended to behave.
Post the config snippets you claim caused it, Optus, and let people
who know what they're talking about prove you right or wrong.
D
On Fri, 17 Nov 2023 at 11:31, Andrew Oakeley <andrew at oakeley.com.au> wrote:
>
> And in the senate enquiry this morning they both blamed Cisco
>
> "The trigger was the Singtel outage, but the root cause was Cisco."
>
> https://www.abc.net.au/news/2023-11-17/asx-markets-business-live-news-optus-outage-senate-inquiry/103115518
>
> -----Original Message-----
> From: AusNOG <ausnog-bounces at lists.ausnog.net> On Behalf Of DaZZa
> Sent: Friday, November 17, 2023 8:15 AM
> To: Luke Thompson <luke.t at tncrew.com.au>
> Cc: michael.bethune at australiaonline.au; ausnog at lists.ausnog.net
> Subject: Re: [AusNOG] Optus downtime chat + affecting SMS verification toTelstra?
>
> And now Singtel have returned serve and are denying it was them.
>
> https://www.zdnet.com/article/singtel-refutes-reports-that-its-system-upgrade-caused-optus-outage/
>
> It's like watching kids trying to blame each other for who broke the window with the cricket ball.
>
> D
>
> On Wed, 15 Nov 2023 at 11:01, Luke Thompson <luke.t at tncrew.com.au> wrote:
> >
> > They've blamed Singtel Internet Exchange (STiX) for the international peering route updates, at least going by anonymous sources cited by SMH.
> >
> > https://www.smh.com.au/technology/identity-of-third-party-who-brought-
> > down-optus-network-revealed-20231114-p5ejy1.html
> >
> > Luke
> >
> > On 14 November 2023 12:37:30 pm Ben Buxton <bb.ausnog at bb.cactii.net> wrote:
> >>
> >>
> >> Blaming routing updates from peers is a scapegoat and never is the cause of an outage - public BGP is the wild west and you're always getting broken information - it's your responsibility to filter those updates and (unless it's a zero-day poison packet bug) you only have yourself to blame if you fall over from them.
> >>
> >> If I were an optus business customer, reading that outage page would just make me even more determined to move elsewhere.
> >>
> >> They vaguely categorised the "what" of the outage into a big bucket (software upgrade related), but gave absolutely no useful information or explain the "why" which would regain my confidence.
> >>
> >> Why did this upgrade trigger an outage?
> >> - Was there a behaviour/feature change they neglected to take into account?
> >> - Did the upgrade require a config change that broke?
> >> - Were they neglectful in following config best practices? (filtering, prefix limits, restarts, etc?)
> >> - Did the new software have an unidentified bug?
> >> - Why did testing not catch this problem (they do test changes...right?)
> >> - How did progressive rollout still lead to this impact? (they do
> >> progressive rollouts over N days/weeks...right?)
> >>
> >> Why did mitigation take so long?
> >> - What detection/telemetry measures led them to realise the scope of the outage? (news reports dont count)
> >> - Were they dependent on the downed network for oncall paging & comms?
> >> - Why did their rollback plan fail? (they had a rollback plan...right?)
> >> - Why was remote console/power access not working? (they have both...right?)
> >> - Were they dependent on the downed network for said access?
> >> - Were their playbooks/credential access dependent on the downed network?
> >>
> >> "We have made changes to the network to address this issue so that it cannot occur again." ... this smells like "whoops forgot to set max-prefix (with restart!)".
> >>
> >> Bugs, config stuff-ups, etc happen, and they will continue to happen - it is a lie to state that outages will never happen again. This is the culmination of monumental failures in the trigger, prevention and mitigation measures which cannot be fixed in a couple of days, it sounds like much deeper architectural and organisational issues need addressing.
> >>
> >> Many of the above failures are things that a young network will experience and learn from, but for Optus these should all be well planned for already.
> >>
> >> I suspect any government investigation will simply add more bureaucracy and boxes to tick rather than effect meaningful change, but one can always be hopeful...
> >>
> >> BB
> >>
> >> On Tue, 14 Nov 2023 at 13:02, Michael Bethune <mike at ozonline.com.au> wrote:
> >>>
> >>> "Optus network received changes to routing information from an
> >>> international peering network following a software upgrade"
> >>>
> >>> I note they are very careful to avoid nominating whose software upgrade.
> >>>
> >>> I also note that when they say they received routing updates, don't
> >>> they limit the number of prefixes accepted by their BGP from any
> >>> given peer?
> >>>
> >>> Sounds like a carefully crafted statement to enable them to point
> >>> fingers elsewhere, not unexpected.
> >>>
> >>> - Michael.
> >>>
> >>> Quoting francisfides at mailup.net:
> >>>
> >>> > Looks like it was a software upgrade:
> >>> > https://www.abc.net.au/news/2023-11-13/optus-identifies-cause-of-n
> >>> > ationwide-outage-software-upgrade/103099902
> >>> >
> >>> > Nothing in their media centre, just appears as a new box on their
> >>> > outage response page:
> >>> > https://www.optus.com.au/notices/outage-response
> >>> >
> >>> > Cheers
> >>> >
> >>> > ----
> >>> > Text:
> >>> >
> >>> > "We have been working to understand what caused the outage on
> >>> > Wednesday, and we now know what the cause was and have taken steps
> >>> > to ensure it will not happen again. We apologise sincerely for
> >>> > letting our customers down and the inconvenience it caused.
> >>> >
> >>> > At around 4.05am Wednesday morning, the Optus network received
> >>> > changes to routing information from an international peering
> >>> > network following a software upgrade. These routing information
> >>> > changes propagated through multiple layers in our network and
> >>> > exceeded preset safety levels on key routers. This resulted in
> >>> > those routers disconnecting from the Optus IP Core network to protect themselves.
> >>> >
> >>> > The restoration required a large-scale effort of the team and in
> >>> > some cases required Optus to reconnect or reboot routers
> >>> > physically, requiring the dispatch of people across a number of
> >>> > sites in Australia. This is why restoration was progressive over the afternoon.
> >>> >
> >>> > Given the widespread impact of the outage, our investigations into
> >>> > the issue took longer than we would have liked as we examined
> >>> > several different paths to restoration. The restoration of the
> >>> > network was at all times our priority and we subsequently
> >>> > established the cause working together with our partners. We have
> >>> > made changes to the network to address this issue so that it
> >>> > cannot occur again.
> >>> >
> >>> > We are committed to learning from what has occurred and continuing
> >>> > to work with our international vendors and partners to increase
> >>> > the resilience of our network. We will also support and fully
> >>> > cooperate with the reviews being undertaken by the Government and the Senate.
> >>> >
> >>> > We continue to invest heavily to improve the resiliency of our
> >>> > network and services."
> >>> >
> >>> > --
> >>> >
> >>> > francisfides at mailup.net
> >>> >
> >>> > On Thu, Nov 9, 2023, at 07:15, DaZZa wrote:
> >>> >> I have all three you're asking about.
> >>> >>
> >>> >> But I'm very small potatoes compared to most of the members of
> >>> >> this list, and my required remote footprint is correspondingly
> >>> >> small, so it's easy to maintain.
> >>> >>
> >>> >> D
> >>> >>
> >>> >> On Thu, 9 Nov 2023 at 06:18, Phillip Grasso
> >>> >> <phillip.grasso at gmail.com> wrote:
> >>> >>>>
> >>> >>>> I mean come on, it's nearly 2024 and a [major] telco does not
> >>> >>>> have remote console access?
> >>> >>>
> >>> >>>
> >>> >>> If we send a poll out to this community, how many would be able
> >>> >>> to genuinely honestly answer:
> >>> >>>
> >>> >>> Do you have a console or appropriate control plane access into
> >>> >>> all your critical infrastructure?
> >>> >>> Do you have independant out of band that does not share any
> >>> >>> infrastructure with your current system(s) - with exemption for
> >>> >>> physical location and power.
> >>> >>> Do you have the ability to remote power control your devices?
> >>> >>>
> >>> >>> We know from the facebook outage in 2021 that they probably
> >>> >>> didn't have the above, so its not entirely uncommon for folks
> >>> >>> to have *proper independant* console and remote access.
> >>> >>>
> >>> >>>
> >>> >>> I empathize with the Optus team and their customers who have
> >>> >>> been negatively impacted by this incident. I sincerely hope that
> >>> >>> some positive outcomes can emerge from this situation, including:
> >>> >>>
> >>> >>> - Attention to critical infrastructure resilience
> >>> >>> - BGP clue increases
> >>> >>> - Incident management improves
> >>> >>> (I'm sure there's more).
> >>> >>>
> >>> >>> Network is a black box to most people and I think a large chunk
> >>> >>> of Australia now knows what it feels like to not have it.
> >>> >>>
> >>> >>>
> >>> >>> On Wed, 8 Nov 2023 at 11:06, Ben Buxton <bb.ausnog at bb.cactii.net> wrote:
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> On Wed, 8 Nov 2023 at 10:14, DaZZa <dazzagibbs at gmail.com> wrote:
> >>> >>>>>
> >>> >>>>> Yeah, I'd be willing to bet that it's a change which wasn't
> >>> >>>>> thoroughly tested before being rolled out, and which had an
> >>> >>>>> inadequate backout plan.
> >>> >>>>
> >>> >>>>
> >>> >>>> Also, "Our on-site technician is actively prioritising
> >>> >>>> establishing a console connection.".
> >>> >>>>
> >>> >>>> I mean come on, it's nearly 2024 and a [major] telco does not
> >>> >>>> have remote console access? Whilst I'm looking forward to
> >>> >>>> enthusiastically reading the PM, I'll have to book a physio
> >>> >>>> appointment in advance due to neck strain from all the head
> >>> >>>> shaking it'll likely induce.
> >>> >>>>
> >>> >>>> BB
> >>> >>>>
> >>> >>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Interestingly, my Optus mobile actually had a valid connection
> >>> >>>>> for a short time - wasn't able to actually DO anything, but
> >>> >>>>> was connected to the OPtus network - but it's now gone to "SOS" mode.
> >>> >>>>>
> >>> >>>>> D
> >>> >>>>>
> >>> >>>>> On Wed, 8 Nov 2023 at 10:01, John Edwards <jaedwards at gmail.com> wrote:
> >>> >>>>> >
> >>> >>>>> > The 4am Wednesday morning outage start looks suspiciously
> >>> >>>>> > like
> >>> >>>>> a firmware upgrade window.
> >>> >>>>> >
> >>> >>>>> > I note that Optus devices where I am are showing "SoS" which
> >>> >>>>> indicates the tower is unable to reach the location register,
> >>> >>>>> which presumably is on a private network and indicative of a
> >>> >>>>> pretty major fault rather than just IP.
> >>> >>>>> >
> >>> >>>>> > John
> >>> >>>>> >
> >>> >>>>> >
> >>> >>>>> > On Wed, 8 Nov 2023 at 09:10, DaZZa <dazzagibbs at gmail.com> wrote:
> >>> >>>>> >>
> >>> >>>>> >> The Optus hamster finally died of old age.
> >>> >>>>> >>
> >>> >>>>> >> I would suggest your SMS issues would be caused by whoever
> >>> >>>>> >> is issuing the SMS using Optus - not so much by the Telstra end receiving it.
> >>> >>>>> >>
> >>> >>>>> >> Anecdotally, Optus enterprise/wholesale appears to be still
> >>> >>>>> >> functional
> >>> >>>>> >> - at least my link appears to be working fine - and my BGP
> >>> >>>>> >> advertisements are still being seen overseas - seems to be
> >>> >>>>> >> only NBN and mobile based services which are busted
> >>> >>>>> >>
> >>> >>>>> >> D
> >>> >>>>> >>
> >>> >>>>> >> On Wed, 8 Nov 2023 at 09:27, <francisfides at mailup.net> wrote:
> >>> >>>>> >> >
> >>> >>>>> >> > Morning all,
> >>> >>>>> >> > Hope the chaos isn't too hard on your work/family.
> >>> >>>>> >> > I have had trouble with a couple of SMS verifications
> >>> >>>>> coming through to me, my Telstra number. Is this related?
> >>> >>>>> >> >
> >>> >>>>> >> > Any general banter around the downtime would be fine too
> >>> >>>>> >> > -
> >>> >>>>> looks like it all began at 4.07am AEDT?
> >>> >>>>> >> >
> >>> >>>>> >> > Cheers
> >>> >>>>> >> >
> >>> >>>>> >> > --
> >>> >>>>> >> >
> >>> >>>>> >> > francisfides at mailup.net
> >>> >>>>> >> > _______________________________________________
> >>> >>>>> >> > AusNOG mailing list
> >>> >>>>> >> > AusNOG at lists.ausnog.net
> >>> >>>>> >> > https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> >>>>> >>
> >>> >>>>> >>
> >>> >>>>> >>
> >>> >>>>> >> --
> >>> >>>>> >> veg·e·tar·i·an:
> >>> >>>>> >> Ancient tribal slang for the village idiot who can't hunt,
> >>> >>>>> fish or ride
> >>> >>>>> >> _______________________________________________
> >>> >>>>> >> AusNOG mailing list
> >>> >>>>> >> AusNOG at lists.ausnog.net
> >>> >>>>> >> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> --
> >>> >>>>> veg·e·tar·i·an:
> >>> >>>>> Ancient tribal slang for the village idiot who can't hunt,
> >>> >>>>> fish or ride _______________________________________________
> >>> >>>>> AusNOG mailing list
> >>> >>>>> AusNOG at lists.ausnog.net
> >>> >>>>> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> >>>>
> >>> >>>> _______________________________________________
> >>> >>>> AusNOG mailing list
> >>> >>>> AusNOG at lists.ausnog.net
> >>> >>>> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> veg·e·tar·i·an:
> >>> >> Ancient tribal slang for the village idiot who can't hunt, fish
> >>> >> or ride _______________________________________________
> >>> >> AusNOG mailing list
> >>> >> AusNOG at lists.ausnog.net
> >>> >> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> > _______________________________________________
> >>> > AusNOG mailing list
> >>> > AusNOG at lists.ausnog.net
> >>> > https://lists.ausnog.net/mailman/listinfo/ausnog
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> AusNOG mailing list
> >>> AusNOG at lists.ausnog.net
> >>> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>
> >> _______________________________________________
> >> AusNOG mailing list
> >> AusNOG at lists.ausnog.net
> >> https://lists.ausnog.net/mailman/listinfo/ausnog
> >>
> >
> > _______________________________________________
> > AusNOG mailing list
> > AusNOG at lists.ausnog.net
> > https://lists.ausnog.net/mailman/listinfo/ausnog
>
>
>
> --
> veg·e·tar·i·an:
> Ancient tribal slang for the village idiot who can't hunt, fish or ride _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> https://lists.ausnog.net/mailman/listinfo/ausnog
--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt, fish or ride
More information about the AusNOG
mailing list