[AusNOG] BGP hold timer values

Ben ben at meh.net.nz
Tue Jan 27 20:31:49 EST 2015


On Tue, Jan 27, 2015 at 08:47:27AM +0000, Alex Samad - Yieldbroker wrote:
> Hi
> 
> I'm wonder what is considered "best practice" or good/responsible hold timer values for BGP.
> 
> Currently I'm set at 3m, but I am considering lowering this to 30s and keep alive down to 20s, potentially even lower. Or if possible to use BFD & BGP, what's the uptake on BFD ?

It depends on so many things.  If you're using BGP to a single provider over a single link then having a high hold time is completely fine.  If you have slow convergence
times, then having a high hold time can be good.  If you have fast routers at both ends, and partial route tables with redundant links then low hold times are good.

It's basically about how quickly it should "give up".  But if an upstream provider loses all of their routes due to a session of theirs dropping then having a low hold time isn't
going to make anything better for you.  Also there's a quick resume functionality in some BGP implementations where it'll hold onto your route table as it was, and be able to send
it to you quicker, whilst keeping track of the updates.

In modern networks it doesn't seem typical to get good "link state" to determine if a peer is up, and BFD can help that situation, but it can also lead to false positives.

If you were to ask for a number for everywhere, I'd probably say 30/90 for keepalive/holdtime.  But 20/60 isn't bad too.  Most people don't use BGP with a single link.  But if you
have single BGP for transit, and some private peers, then you could stick with 3 minutes, and set peers to lower holdtime.

I wouldn't advise going as low as 30s without good reason, I'd just stick with 20/60 in that instance.    And hold time is meant to be a minimum of 3x keepalive afaik.  if you really
want to fail over as quick as 30secs you should implement it at the fibre level rather than through BGP, it can take a while for changes to propogate upstream with any bouncing of BGP
connections.

One thing some people do about slow convergence times, is they have a default route through one of their transit providers, so at least you get some connectivity before the full tables
load.

Ben.


More information about the AusNOG mailing list