[AusNOG] Best practices on speeding up BGP convergence times
Geoff Huston
gih at apnic.net
Mon Feb 26 22:52:21 EST 2018
I’m not sure that we have a clear idea of what “convergence" means in this this context. lets try to walk through this. If you are referring to the amount of time it takes for a distance vector protocol like BGP to get to a point where there are no further updates to a routed prefix, then the convergence time of the system is actually related to the diameter of the network and the behaviour of the various MRAI timers. In theory one could turn down the current MRAI timer values (Cisco use 27-30 seconds, randomly varied in the space). This would mean that updates might propagate faster across the network, but at the cost of a dramatic increase in the number of updates. Its hard to tell that convergence times would actually change by much if you altered the default MRAI timer values in all these BGP speaking routers. However, this is at best a theoretic conversation as changing the MRAI timer values across the Internet is just dreaming!
But it seems to me that this is not exactly what you are talking about.
I am guessing a bit here, but you appear to be saying that you have an eBGP session A, and over this session you announce all your IGP routes and a second eBGP session B over which you announce nothing. When the link for session A goes down you want to start announcing your routes through session B and get incoming traffic quickly. Currently this takes 20 - 30 seconds and you are unhappy about that time.
Two problems here:
a) detecting link down quickly
You can adjust your BGP session keepalive timers to smaller values and make the session more sensitive to outages as a result. I also thought that these days you can get the interface status to directly map to the session state, but its been a while since I’ve done this in anger and frankly I have NFC how to do that, even if I used to know! Maybe you are already doing that anyway.
b) getting everyone else to react to the withdrawal of routes via session A and learn session B.
Now this is back to MRAI timers and convergence - not you, but with everyone else. As the new path details propagate outward you are affected by other peoples MRAI timers and that is not under your control. The average convergence time in routing in the V4 internet is 50 seconds these days. If you are seeing a break of 20 - 30 seconds for traffic to flow then frankly thats a good number!
At this point it is possible to talk about announcing backup routes and essentially pre-provisioning the backup path, but BGP won't let you do that unless the remote BGP speakers support BGP AddPath (which is unlikely I’d guess). The entire issue with BGP best path is that the backup path is not propagated. At this point you might be tempted to announce more specifics via session A and aggregates on session B. This means that as the more specifics are withdrawn the aggregate backup takes over. Still not instant failover, but it might shave off a few seconds. Of course you are adding to the overall routing noise, but, well, you know, many folk put their own requirements above the more general issues of routing cleanliness, and I hear them justify it by saying: what's a few more routes added to the 700,000 we have already.
So again 20 - 30 seconds is a good number in BGP terms for what I _think_ you are doing here.
Geoff
> On 26 Feb 2018, at 7:00 pm, David Hughes <david at hughes.com.au> wrote:
>
>
> Hi
>
> It was at APRICOT but only dealt with how bad the default timers were in dealing with silent peer failure. If you’re running BFD then you aren’t waiting for timers to expire before tearing down the session so those details are largely irrelevant. For those that aren’t running BFD upstream it may be worth a read.
>
> I just googled to find those slides Chris. They were from 2006. You need to get out more :-)
>
>
>
> Thanks
>
> David
> ...
>
>
>> On 26 Feb 2018, at 4:27 pm, Chris Chaundy <chris.chaundy at gmail.com> wrote:
>>
>> Hello Rhys,
>>
>> David Hughes presented a paper on BGP tuning some years ago (at AusNOG?) which may be worth digging up. While there have been few changes to BGP itself, of course there have been bells and whistles added to the routers (such as BFD, etc.) which will help so I guess the paper may be (over) due for an update (David? :-).
>>
>> Cheers, Chris Chaundy
>> (Retired Network Engineer)
>>
>> On Mon, Feb 26, 2018 at 12:23 PM, Rhys Hanrahan <rhys at nexusone.com.au> wrote:
>> Hi Everyone,
>>
>>
>>
>> I’ve been looking at improving our BGP configuration lately, and I would just like to see if I’m missing anything obvious in terms of speeding up BGP convergence (particularly inbound convergence) with our transit providers during failover. I understand that BGP convergence on the internet is not going to be perfect, but I am trying to ensure I tune things as best I can. We are using Cisco ASR1001-Xs for reference, though I’m more wondering about general best practices that other ISPs use, that I can then adapt to our network.
>>
>>
>>
>> I’m also curious if my expectations of trying to minimise convergence times with transit peers are realistic or not.
>>
>>
>>
>> Right now, I am seeing 20-30 second outage windows when failing over my announced prefixes from one transit provider to another. I can understand this when transitioning between transits, but I see this even when failing over between a primary/secondary peering session with a single AS / transit provider, which is disappointing. My hope was to have almost no interruption where we have multiple links with a given transit provider, and small convergence window (maybe 2-5s?) when transitioning prefixes from one transit to another.
>>
>>
>>
>> With iBGP seems like there’s lots of options and it would be possible to achieve sub-second convergence fairly easily. But eBGP is where it becomes more limited and difficult to improve the situation.
>>
>>
>>
>> For iBGP I can do:
>>
>>
>>
>> • BFD
>> • BGP Multipath – I haven’t tested, but I assume having multiple paths in the FIB will speed up failover convergence.
>> • BGP Best External
>> • Add Path
>>
>>
>> For eBGP I can do:
>>
>>
>>
>> • BFD (If supported by the upstream – I have this on all peers)
>> • Advertisement Internal – I have set this to 0 (doesn’t make a major difference, but helps a little)
>> • BGP Multipath (if supported by the upstream – unfortunately my upstream requires the primary/secondary paths are enforced on their side via localpref so I can’t leverage this).
>> • AS Path prepending of the same prefixes instead of announcing less/more specific prefixes at different sites seems to help.
>>
>>
>> I haven’t found any other commonly accepted methods of announcing a backup path to eBGP peers.
>>
>>
>>
>> We are using Equinix Connect transit in Sydney as our main transit, where we have primary and secondary links between us and Equinix. And Vocus as our main transit in Melbourne, with the intention of failing over all our announced prefixes between sites as required, by leveraging AS Path prepending.
>>
>>
>>
>> Are there any other techniques or best practices I am missing to help try and reduce downtime in the event of a router or BGP session failure event?
>>
>>
>>
>> Appreciate any insights you can offer, and hope this proves to be a useful and interesting discussion for others.
>>
>>
>>
>> Thanks!
>>
>>
>> Rhys Hanrahan
>> Chief Information Officer
>> Nexus One Pty Ltd
>>
>> E: support at nexusone.com.au
>> P: +61 2 9191 0606
>> W: http://www.nexusone.com.au/
>> M: PO Box 127, Royal Exchange NSW 1225
>> A: Level 10 307 Pitt St, Sydney NSW 2000
>>
>> <image001.jpg>
>>
>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>>
>>
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
More information about the AusNOG
mailing list