<div dir="ltr"><div><div>There's a lot of ways you can meddle with socket options and timers, enabling TCP keepalive for example.<br><br></div>But I was thinking more, what's the worse case you can take a line down for, and not have any sessions fail. Worst case, as far as I can make out, is where there's a parallel path available, you drop one socket, and another session, taking a different path, has its TCP sequence wrap, then the former socket's packet matches the sequence, and either resets the socket, or corrupts the data.<br><br></div>Paul Wilkins<br></div><div class="gmail_extra"><br><div class="gmail_quote">On 1 July 2015 at 19:02, Mark Smith <span dir="ltr"><<a href="mailto:markzzzsmith@gmail.com" target="_blank">markzzzsmith@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 1 July 2015 at 18:31, Paul Wilkins <<a href="mailto:paulwilkins369@gmail.com">paulwilkins369@gmail.com</a>> wrote:<br>

> The maximum timeout a particular application can withstand a dropout without<br>

> the session getting torn down, (which is implementation dependent), and the<br>

> maximum timeout you can experience without _any_ applications being<br>

> affected, are different things.<br>

<br>

</span>Are you describing a scenario where a specific application has changed<br>

TCP's default operating parameters e.g., timeouts?<br>

<br>

If you are, and the applications TCP parameters have been set so low<br>

that the application will not tolerate a 3 to 5 second period of<br>

transient packet loss, then you wouldn't be able to do what I've<br>

suggested you do would you? Of course, you're making assurances that<br>

no possible transient event in your network that could impact that<br>

particular application's traffic will take any longer than what you've<br>

lowered the TCP timeout parameters to, and you would know you've made<br>

those assurances.<br>

<span class=""><br>

> If a TCP session is closing, and you pull<br>

> the plug, the reset may be left wandering the network.<br>

<br>

</span>Not endlessly. Either its TTL/Hop Count will reach zero, or it will be<br>

dropped because the destination is unreachable.<br>

<span class=""><br>

 If the network<br>

> returns later than TIME_WAIT, there may be issues.<br>

><br>

<br>

</span>The reset will be gone by then.<br>

<div class="HOEnZb"><div class="h5"><br>

> Paul Wilkins<br>

><br>

> On 1 July 2015 at 17:38, Mark Smith <<a href="mailto:markzzzsmith@gmail.com">markzzzsmith@gmail.com</a>> wrote:<br>

>><br>

>> On 1 July 2015 at 16:40, Paul Wilkins <<a href="mailto:paulwilkins369@gmail.com">paulwilkins369@gmail.com</a>> wrote:<br>

>> > Mark,<br>

>> > It's implementation specific (depends what options you pass to<br>

>> > setsockopt/sockstream).<br>

>> ><br>

>> > There's problems resulting from having TIME_WAIT too long, with<br>

>> > wandering<br>

>> > duplicate. Supposedly RFC1337 says TIME_WAIT should be at least 2<br>

>> > minutes,<br>

>> > but on my Linux box, I just timed a dropped socket, and it timed out<br>

>> > after<br>

>> > one minute.<br>

>> ><br>

>><br>

>> So I just checked Stevens Volume 1, which is where I read about this<br>

>> (back in 1998 or earlier IIRC). The timer that triggers retransmission<br>

>> is the Round Trip Timeout or TRO, which is measured and updated for<br>

>> the TCP connection (as, for example, the topology of the network could<br>

>> change while the TCP connection is active). Once the RTO times out,<br>

>> the retransmission intervals I mentioned occur.<br>

>><br>

>> The TIME_WAIT timer you're describing is the one used after the TCP<br>

>> connection has closed, and it is there to ensure any TCP segments that<br>

>> belong to the closed TCP connection that might still be floating<br>

>> around the network expire.<br>

>><br>

>><br>

>><br>

>> > Paul Wilkins<br>

>> ><br>

>> > On 1 July 2015 at 15:17, Mark Smith <<a href="mailto:markzzzsmith@gmail.com">markzzzsmith@gmail.com</a>> wrote:<br>

>> >><br>

>> >> On 1 July 2015 at 15:11, Mark Smith <<a href="mailto:markzzzsmith@gmail.com">markzzzsmith@gmail.com</a>> wrote:<br>

>> >> > On 1 July 2015 at 14:56, Mark Smith <<a href="mailto:markzzzsmith@gmail.com">markzzzsmith@gmail.com</a>> wrote:<br>

>> >> >> On 1 July 2015 at 12:33, Ross Wheeler <<a href="mailto:ausnog@rossw.net">ausnog@rossw.net</a>> wrote:<br>

>> >> >>><br>

>> >> >>><br>

>> >> >>> I had several links went down at 10:00 (give or take a few seconds)<br>

>> >> >>> -<br>

>> >> >>> well,<br>

>> >> >>> not mine so much as my upstream - and it's been blamed on this<br>

>> >> >>> issue.<br>

>> >> >>><br>

>> >> >><br>

>> >> >> So from a little bit of Human Computer Interaction (HCI) I studied<br>

>> >> >> many years ago, I remember that humans will wait for some sort of<br>

>> >> >> response for between 3 to 5 seconds. So if the period of your packet<br>

>> >> >> loss and the retransmission to recover from it is short enough, the<br>

>> >> >> humans effected may notice a slight delay, but they won't take any<br>

>> >> >> remedial actions themselves (i.e, they won't push the submit button<br>

>> >> >> again, and won't complain about it.)<br>

>> >> >><br>

>> >> ><br>

>> >> > This can also be particularly useful to know when cutting a set of<br>

>> >> > links over from an old piece of equipment to a new one. 3 to 5<br>

>> >> > seconds<br>

>> >> > is a bit tight to move the link, you can push people's response<br>

>> >> > expectations out in the outage notice (e.g., "between 7 and 8 am, we<br>

>> >> > will be conducting network maintenance. During this period, you may<br>

>> >> > encounter system delays of up to 5 to 10 seconds). I think asking<br>

>> >> > people to wait any longer than 10 seconds means this is a service<br>

>> >> > impacting outage and should be scheduled out of normal operating<br>

>> >> > hours.<br>

>> >> ><br>

>> >> > Also make sure that anything/any protocols that may cause the new<br>

>> >> > equipment to taking longer than 3 to 5 seconds to bring up the link<br>

>> >> > is<br>

>> >> > temporarily or permanently switched off. Traditional STP would be a<br>

>> >> > prime example (make sure there isn't a loop in the network topology<br>

>> >> > at<br>

>> >> > all, or at least during the cut-over window if you're going to switch<br>

>> >> > STP back on later). Bear in mind that your window from<br>

>> >> > "working-to-working" is the 5 to 10 seconds (or 3 to 5 normally), so<br>

>> >> > e.g., BGP sessions might come up within a few seconds, but if<br>

>> >> > downloading the full route table, resolving the routes and putting<br>

>> >> > them into the FIB is going to take more than 10 seconds, you'll have<br>

>> >> > to do a proper service impacting outage at an appropriate time.<br>

>> >> ><br>

>> >> > Finally, remember that UDP and DCCP don't do recovery from packet<br>

>> >> > loss, so if your apps are using them, they'll either have to be<br>

>> >> > tolerant of packet loss of up to 10 (or 3 to 5) seconds, do recovery<br>

>> >> > themselves, or should be rewritten to use TCP or SCTP.<br>

>> >> ><br>

>> >> > <snip><br>

>> >><br>

>> >> One last thing, you also need to know how the characteristics of and<br>

>> >> how persistent your reliable protocols are attempting to recover from<br>

>> >> packet loss. If your reliable protocol gives up within the 3 to 5 or 5<br>

>> >> to 10 second window, your customers/users will suffer an outage. TCP,<br>

>> >> for example, doesn't give up easily. If I recall correctly, it will<br>

>> >> try for up to around 9 minutes, and tries at doubling intervals up<br>

>> >> until 64 seconds and then each 64 seconds i.e., attempts at 1, 2, 4,<br>

>> >> 8, 16, 32, 64, 64, 64, ... seconds.<br>

>> >> _______________________________________________<br>

>> >> AusNOG mailing list<br>

>> >> <a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>

>> >> <a href="http://lists.ausnog.net/mailman/listinfo/ausnog" rel="noreferrer" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a><br>

>> ><br>

>> ><br>

>> ><br>

>> > _______________________________________________<br>

>> > AusNOG mailing list<br>

>> > <a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>

>> > <a href="http://lists.ausnog.net/mailman/listinfo/ausnog" rel="noreferrer" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a><br>

>> ><br>

><br>

><br>

><br>

> _______________________________________________<br>

> AusNOG mailing list<br>

> <a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>

> <a href="http://lists.ausnog.net/mailman/listinfo/ausnog" rel="noreferrer" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a><br>

><br>

</div></div></blockquote></div><br></div>