[AusNOG] Best practices on speeding up BGP convergence times

Tue Feb 27 09:37:03 EST 2018

On 26 Feb 2018, at 9:52 pm, Geoff Huston <gih at apnic.net<mailto:gih at apnic.net>> wrote:

a) detecting link down quickly

You can adjust your BGP session keepalive timers to smaller values and make the session more sensitive to outages as a result. I also thought that these days you can get the interface status  to directly map to the session state, but its been a while since I’ve done this in anger and frankly I have NFC how to do that, even if I used to know! Maybe you are already doing that anyway.

This is the scenario I was talking about (references below).  You can easily have link on a northbound interface even if the peer isn’t there (you hit a layer-2 agg switch on the way for example).  If the peer fails but you still have link on the interface you’ll be blindly forwarding packets to it, even though it’s not there anymore, until the BGP timers expire.  That was the point of the lightning talk I gave way-back -then.  Default timers aren’t helpful in this situation.

Fast forward to this decade and you have routing protocols that are “BFD-aware” so you have sub-second link failure detection.  That allows the control plane to pull down the peer session and remove paths to that peer from the FIB.  You can only run BFD if your upstream is as well so you know they will dump the prefixes from that peer session as quickly as you will.  It makes failing over to a secondary link within the same upstream provider pretty seamless.

Ref :
http://archive.apnic.net/meetings/21/docs/sigs/routing/routing-pres-hughes-bgp.pdf
http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html

David
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20180226/ca343ef0/attachment.html>