[AusNOG] Endlessly resetting modems after DNS outage

Fri May 27 15:10:49 EST 2016

Hi Narelle,

Yes, I have seen something like this recently while working with an IoT
developer.

There is a tendency to set short DNS TTL's for load balancing purposes in
modern web apps. If the development team is optimising for "cloud"
deployments with a view to having AWS available to handle additional load,
then this flexibility is probably rational.

A problem with short TTL's in embedded devices is that they then need to do
a DNS lookup nearly every time they send a heartbeat - instead of just
checking the resolver cache from the last lookup. This works fine in the
lab, but in the real world it means that a single lost packet in a UDP DNS
lookup can cause a TCP heartbeat process to fail.

Failure scenario => Congestion => More lost packets => Race condition.

If this is the case, a time-series graph of such failures may show spikes
with an interval equivalent to the DNS record TTL.

John

On 27 May 2016 at 14:06, Narelle <narellec at gmail.com> wrote:

>
> At the risk of triggering a whirlpool like response - has anyone else seen
> a fault like this in other implementations?
>
>
> http://exchange.telstra.com.au/2016/05/27/nbn-and-adsl-disruption-what-happened/
>
> Any other thoughts or comments - feel free to send off list.
>
>
>
>
> --
>
>
> Narelle
> narellec at gmail.com
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20160527/12baf8ed/attachment.html>