[AusNOG] Best practices on speeding up BGP convergence times
Alex Samad
alex at samad.com.au
Thu Mar 1 08:44:08 EST 2018
- My transit provider in Sydney uses localpref on their side to
designate one session as “primary” and I am not able to change that. But I
can and do send traffic out on both links as equal cost.
Thats interesting, haven't had a vendor do that. I typically use med to
preffer one path over another for the same vendor
In terms of time it takes to learn a new outbound path, I don’t see this as
an issue given the options I have to announce multiple paths over iBGP and
use of BFD – this should be possible to make quick by tuning my internal
peer configs.
Guess this comes down to the hardware. I was testing with mikrotik routers
and found inserting / deleting routes could take a long time.
A
On 27 February 2018 at 13:12, Rhys Hanrahan <rhys at nexusone.com.au> wrote:
> Hi Guys,
>
>
>
> Thanks David for confirming BFD is the way to go here. Luckily, I have
> been able to enable BFD on all my transit links so far, so the time to
> detect peer failure has been quick.
>
>
>
> And thanks Geoff for your detailed reply. From some off-list discussions,
> I think that I first need to apply some of the configs (like Add-Path) that
> I mentioned originally and see how I go from there, and also need to
> pinpoint with more certainty where the issue is occurring.
>
>
>
> I know that I’ve mentioned primary/secondary transit links, but I actually
> _*am*_ announcing all prefixes on all transit links, and I’m only using
> AS Path prepending to try and optimise routing for prefixes that are in VIC
> vs NSW. So it’s not a case of conditionally advertising routes in this
> case. I did also try advertising more specific prefixes (e.g. /22 at NSW
> and /24 in VIC) but I found anecdotally that AS path prepending was faster
> for the inbound traffic to converge during failover.
>
>
>
> So in a sense, I _*am*_ talking about MRAI timers, which I totally
> understand is just not a valid discussion to be having in the context of
> the general internet and it’s likely that yes, the outage window I’m seeing
> when a prefix is announced over a new transit path is totally reasonable.
> BUT where I start to run into a problem with the outcome is still this way
> when I have multiple links with a single transit provider. For example:
>
>
>
> - I have cross-connect directly between one of my transit edge routers
> and one of their routers.
> - I have another cross-connect directly between another of my transit
> edge routers and another of their routers (and this is not to mean that I
> intend this to be a backup path – I send out traffic active/active).
> - Both links are to the same transit provider, in the same POP.
> - I am advertising the same prefixes over both links, no AS path
> prepending, so the announcements are basically identical.
> - My transit provider in Sydney uses localpref on their side to
> designate one session as “primary” and I am not able to change that. But I
> can and do send traffic out on both links as equal cost.
> - As far as the rest of the internet is concerned my prefixes are
> still being announced from the same transit provider, so there shouldn’t be
> a need to propagate routing changes beyond my directly adjacent peer and
> their internal network. This is primarily why I am expecting not to see any
> impact in this scenario.
> - Given that I have adjusted my MRAI timer down to 0 with my adjacent
> transit peers, and have BFD enabled, they should be able to switchover to
> the alternate link fairly quickly
> - And yet, I see a 20 second outage window even in this scenario when
> I ping from an external connection into one of my prefixes announced over
> this transit.
>
>
>
> That scenario above is mainly what I am concerned about as I didn’t expect
> much/any service impact in the above scenario, since I would have thought
> the path over the internet in general would remain unchanged up till my
> transit provider’s internal network.
>
>
>
> Regarding what you listed as problem b) totally understand this, and I
> would expect some kind of delay when re-announcing via another transit
> since as you say, this has to propagate through countless upstreams
> throughout the internet - naturally this will take time. It’s good to hear
> you say 20-30 seconds is a good number in terms of getting everyone to
> re-learn routes. That’s really helpful.
>
>
>
> In terms of time it takes to learn a new outbound path, I don’t see this
> as an issue given the options I have to announce multiple paths over iBGP
> and use of BFD – this should be possible to make quick by tuning my
> internal peer configs.
>
>
>
> Thanks everyone for your experiences and insights. Based on some of the
> replies I got, it seems like it is reasonable to expect that in the
> scenario described in the bullet points above, it’s possible to see very
> little if any forwarding loss. And only once I am forced to advertise via a
> new transit would I expect to see the 20-30 second window as everyone on
> the internet learns a new path. I do need to improve my iBGP convergence
> and actually implement some of the methods I mentioned originally, and
> re-evaluate so as to rule out my iBGP convergence time as the issue I’m
> currently seeing for the scenario in the bullet points above.
>
>
>
> Thanks everyone for your help.
>
>
> Rhys Hanrahan
> Chief Information Officer
> Nexus One Pty Ltd
>
> E: support at nexusone.com.au
> P: +61 2 9191 0606 <(02)%209191%200606>
> W: http://www.nexusone.com.au/
> M: PO Box 127, Royal Exchange NSW 1225
> A: Level 10 307 Pitt St, Sydney NSW 2000
>
> [image: ttp://quintus.nexusone.com.au/~rhys/nexus1-email-sig.jpg]
>
> *From: *AusNOG <ausnog-bounces at lists.ausnog.net> on behalf of David
> Hughes <david at hughes.com.au>
> *Date: *Tuesday, 27 February 2018 at 9:39 am
> *To: *Geoff Huston <gih at apnic.net>
> *Cc: *"ausnog at lists.ausnog.net" <ausnog at lists.ausnog.net>
> *Subject: *Re: [AusNOG] Best practices on speeding up BGP convergence
> times
>
>
>
>
>
> On 26 Feb 2018, at 9:52 pm, Geoff Huston <gih at apnic.net> wrote:
>
>
>
>
> a) detecting link down quickly
>
> You can adjust your BGP session keepalive timers to smaller values and
> make the session more sensitive to outages as a result. I also thought that
> these days you can get the interface status to directly map to the session
> state, but its been a while since I’ve done this in anger and frankly I
> have NFC how to do that, even if I used to know! Maybe you are already
> doing that anyway.
>
>
>
>
>
> This is the scenario I was talking about (references below). You can
> easily have link on a northbound interface even if the peer isn’t there
> (you hit a layer-2 agg switch on the way for example). If the peer fails
> but you still have link on the interface you’ll be blindly forwarding
> packets to it, even though it’s not there anymore, until the BGP timers
> expire. That was the point of the lightning talk I gave way-back -then.
> Default timers aren’t helpful in this situation.
>
>
>
> Fast forward to this decade and you have routing protocols that are
> “BFD-aware” so you have sub-second link failure detection. That allows the
> control plane to pull down the peer session and remove paths to that peer
> from the FIB. You can only run BFD if your upstream is as well so you know
> they will dump the prefixes from that peer session as quickly as you will.
> It makes failing over to a secondary link within the same upstream provider
> pretty seamless.
>
>
>
>
>
> Ref :
>
> http://archive.apnic.net/meetings/21/docs/sigs/routing/
> routing-pres-hughes-bgp.pdf
>
> http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html
>
>
>
>
>
> David
>
> ...
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20180301/7ec74a96/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 17039 bytes
Desc: not available
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20180301/7ec74a96/attachment.jpg>
More information about the AusNOG
mailing list