<div dir="ltr">
<ul type="disc" style="margin-bottom:0cm;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:16px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;margin-top:0cm"><li class="MsoNormal" style="margin:0px 0px 0px 15px"><span lang="EN-GB">My transit provider in Sydney uses localpref on their side to designate one session as “primary” and I am not able to change that. But I can and do send traffic out on both links as equal cost.</span></li></ul>
<br><div>Thats interesting, haven't had a vendor do that. I typically use med to preffer one path over another for the same vendor</div><div><br></div><div><br></div><div>
<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:16px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">In terms of time it takes to learn a new outbound path, I don’t see this as an issue given the options I have to announce multiple paths over iBGP and use of BFD – this should be possible to make quick by tuning my internal peer configs.</span>
<br></div><div><br></div><div><br></div><div>Guess this comes down to the hardware. I was testing with mikrotik routers and found inserting / deleting routes could take a long time.</div><div><br></div><div>A</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 27 February 2018 at 13:12, Rhys Hanrahan <span dir="ltr"><<a href="mailto:rhys@nexusone.com.au" target="_blank">rhys@nexusone.com.au</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-AU" link="blue" vlink="purple">
<div class="m_1237781065351343678WordSection1">
<p class="MsoNormal"><span lang="EN-GB">Hi Guys,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Thanks David for confirming BFD is the way to go here. Luckily, I have been able to enable BFD on all my transit links so far, so the time to detect peer failure has been quick.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">And thanks Geoff for your detailed reply. From some off-list discussions, I think that I first need to apply some of the configs (like Add-Path) that I mentioned originally and see how
I go from there, and also need to pinpoint with more certainty where the issue is occurring.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">I know that I’ve mentioned primary/secondary transit links, but I actually _<i>am</i>_ announcing all prefixes on all transit links, and I’m only using AS Path prepending to try and
optimise routing for prefixes that are in VIC vs NSW. So it’s not a case of conditionally advertising routes in this case. I did also try advertising more specific prefixes (e.g. /22 at NSW and /24 in VIC) but I found anecdotally that AS path prepending was
faster for the inbound traffic to converge during failover.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">So in a sense, I _<i>am</i>_ talking about MRAI timers, which I totally understand is just not a valid discussion to be having in the context of the general internet and it’s likely
that yes, the outage window I’m seeing when a prefix is announced over a new transit path is totally reasonable. BUT where I start to run into a problem with the outcome is still this way when I have multiple links with a single transit provider. For example:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoNormal"><span lang="EN-GB">I have cross-connect directly between one of my transit edge routers and one of their routers.<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">I have another cross-connect directly between another of my transit edge routers and another of their routers (and this is not to mean that I intend
this to be a backup path – I send out traffic active/active). <u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">Both links are to the same transit provider, in the same POP.<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">I am advertising the same prefixes over both links, no AS path prepending, so the announcements are basically identical.<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">My transit provider in Sydney uses localpref on their side to designate one session as “primary” and I am not able to change that. But I can and do send
traffic out on both links as equal cost.<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">As far as the rest of the internet is concerned my prefixes are still being announced from the same transit provider, so there shouldn’t be a need to
propagate routing changes beyond my directly adjacent peer and their internal network. This is primarily why I am expecting not to see any impact in this scenario.<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">Given that I have adjusted my MRAI timer down to 0 with my adjacent transit peers, and have BFD enabled, they should be able to switchover to the alternate
link fairly quickly<u></u><u></u></span></li><li class="MsoNormal"><span lang="EN-GB">And yet, I see a 20 second outage window even in this scenario when I ping from an external connection into one of my prefixes announced over this transit.<u></u><u></u></span></li></ul>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">That scenario above is mainly what I am concerned about as I didn’t expect much/any service impact in the above scenario, since I would have thought the path over the internet in general
would remain unchanged up till my transit provider’s internal network.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Regarding what you listed as problem b) totally understand this, and I would expect some kind of delay when re-announcing via another transit since as you say, this has to propagate
through countless upstreams throughout the internet - naturally this will take time. It’s good to hear you say 20-30 seconds is a good number in terms of getting everyone to re-learn routes. That’s really helpful.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">In terms of time it takes to learn a new outbound path, I don’t see this as an issue given the options I have to announce multiple paths over iBGP and use of BFD – this should be possible
to make quick by tuning my internal peer configs.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Thanks everyone for your experiences and insights. Based on some of the replies I got, it seems like it is reasonable to expect that in the scenario described in the bullet points above, it’s possible to see very little
if any forwarding loss. And only once I am forced to advertise via a new transit would I expect to see the 20-30 second window as everyone on the internet learns a new path. I do need to improve my iBGP convergence and actually implement some of the methods
I mentioned originally, and re-evaluate so as to rule out my iBGP convergence time as the issue I’m currently seeing for the scenario in the bullet points above.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB">Thanks everyone for your help.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB"><br>
Rhys Hanrahan<br>
Chief Information Officer<br>
Nexus One Pty Ltd<br>
<br>
E: <a href="mailto:support@nexusone.com.au" target="_blank"><span style="color:#0563c1">support@nexusone.com.au</span></a><br>
P: <a href="tel:(02)%209191%200606" value="+61291910606" target="_blank">+61 2 9191 0606</a><br>
W: <a href="http://www.nexusone.com.au/" target="_blank">http://www.nexusone.com.au/</a><br>
M: PO Box 127, Royal Exchange NSW 1225<br>
A: Level 10 307 Pitt St, Sydney NSW 2000<br>
<br>
<img border="0" width="280" height="73" style="width:2.9166in;height:.7604in" id="m_1237781065351343678Picture_x0020_2" src="cid:image001.jpg@01D3AFCC.94ED3880" alt="ttp://quintus.nexusone.com.au/~rhys/nexus1-email-sig.jpg"></span><span lang="EN-GB"><u></u><u></u></span></p>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">AusNOG <<a href="mailto:ausnog-bounces@lists.ausnog.net" target="_blank">ausnog-bounces@lists.ausnog.<wbr>net</a>> on behalf of David Hughes <<a href="mailto:david@hughes.com.au" target="_blank">david@hughes.com.au</a>><br>
<b>Date: </b>Tuesday, 27 February 2018 at 9:39 am<br>
<b>To: </b>Geoff Huston <<a href="mailto:gih@apnic.net" target="_blank">gih@apnic.net</a>><br>
<b>Cc: </b>"<a href="mailto:ausnog@lists.ausnog.net" target="_blank">ausnog@lists.ausnog.net</a>" <<a href="mailto:ausnog@lists.ausnog.net" target="_blank">ausnog@lists.ausnog.net</a>><br>
<b>Subject: </b>Re: [AusNOG] Best practices on speeding up BGP convergence times<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal"><a name="m_1237781065351343678__MailOriginalBody"><u></u> <u></u></a></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span>On 26 Feb 2018, at 9:52 pm, Geoff Huston <</span><a href="mailto:gih@apnic.net" target="_blank"><span>gih@apnic.net</span><span></span></a><span>>
wrote:</span></p>
</div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
<div>
<div>
<p class="MsoNormal"><span><br>
a) detecting link down quickly<br>
<br>
You can adjust your BGP session keepalive timers to smaller values and make the session more sensitive to outages as a result. I also thought that these days you can get the interface status to directly map to the session state, but its been a while since
I’ve done this in anger and frankly I have NFC how to do that, even if I used to know! Maybe you are already doing that anyway.</span></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span>This is the scenario I was talking about (references below). You can easily have link on a northbound interface even if the peer isn’t there (you hit a layer-2 agg switch on the way for example).
If the peer fails but you still have link on the interface you’ll be blindly forwarding packets to it, even though it’s not there anymore, until the BGP timers expire. That was the point of the lightning talk I gave way-back -then. Default timers aren’t
helpful in this situation.</span></p>
</div>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span>Fast forward to this decade and you have routing protocols that are “BFD-aware” so you have sub-second link failure detection. That allows the control plane to pull down the peer session and
remove paths to that peer from the FIB. You can only run BFD if your upstream is as well so you know they will dump the prefixes from that peer session as quickly as you will. It makes failing over to a secondary link within the same upstream provider pretty
seamless.</span></p>
</div>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span>Ref :</span></p>
</div>
<div>
<p class="MsoNormal"><span></span><a href="http://archive.apnic.net/meetings/21/docs/sigs/routing/routing-pres-hughes-bgp.pdf" target="_blank"><span>http://archive.apnic.net/<wbr>meetings/21/docs/sigs/routing/<wbr>routing-pres-hughes-bgp.pdf</span><span></span></a><span></span></p>
</div>
<div>
<p class="MsoNormal"><span></span><a href="http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html" target="_blank"><span>http://lists.ausnog.net/<wbr>pipermail/ausnog/2015-January/<wbr>029486.html</span><span></span></a><span></span></p>
</div>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span>David</span></p>
</div>
<div>
<p class="MsoNormal"><span>...</span></p>
</div>
</div>
</div>
<br>______________________________<wbr>_________________<br>
AusNOG mailing list<br>
<a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>
<a href="http://lists.ausnog.net/mailman/listinfo/ausnog" rel="noreferrer" target="_blank">http://lists.ausnog.net/<wbr>mailman/listinfo/ausnog</a><br>
<br></blockquote></div><br></div>