[AusNOG] Detecting "hung" ssh sessions.

Mon Feb 22 14:14:23 EST 2016

20 years ago, In the old old days on dial-in on BSDi servers, the solution
to this problem on hung serial ports was "stty -f /dev/xxx flush"

Not sure if this is applicable to ssh interfaces, but it looks like stty
still has a man page :)

John

On 22 February 2016 at 13:37, Ross Wheeler <ausnog at rossw.net> wrote:

>
> Hi Noggers.
>
> Looking for a "bright idea" or a point in the right direction.
>
> I have a bunch of remote devices that live behind nat and firewalls, in
> uncontrolled environments. It's not always (or even frequently) possible to
> get those in charge of said NAT boxes to do PAT to my devices, so instead I
> have each device ssh to one of my hosts and create a reverse tunnel. (The
> tunnels are bound only to the loopback interface on my server, so the end
> devices are not significantly exposed to the outside world).
>
> As and when I need to access remote boxes, I ssh to the terminating host,
> ssh to the appropriate port and have immediate shell access on the remote
> box.
>
> Each remote box also periodically (cron) checks that the ssh session is
> (still) running (simple ps) and (re)starts it if not.
>
> This generally works well.
>
> Alas, this morning, my provider had a brief oopsee (no explanation
> forthcoming) where 100% of my external connectivity dropped for a few
> minutes.
>
> This resulted in every last one of these tunnels breaking, but they've
> broken in such a way that they didn't restart. The terminating host shows
> no connections from any of the remote devices, yet all of the remote
> devices still have their ssh session running. They simply are not passing
> any traffic. Yes, I have keepalives enabled.
>
> Does anyone know of a simple, effective, reliable way to detect (from the
> client end) the loss of end-to-end function of a tunnel like this without
> going completely overboard - installing replacement versions of ssh isn't
> going to work for me, running autossh similarly.
>
> Things I've looked at but lucked out with include adding a static route to
> my server and looking for either byte counters or last-used timers with
> netstat, looking for per-process traffic or tcp counters and a number of
> other failed avenues.
>
> I could add ipfw and pass traffic through a rule to observe if its passing
> traffic or not, but that has lots of other negative impacts, especially on
> a few machines that are already balls-to-the-wall on their network
> interfaces.
>
> I considered tcpdump to see when a packet was last received, but it too
> has lots of other overheads.
>
> I'm sure I'm not the only person to have ever faced this, lots of people
> will have overcome it, but I can't find any information on it. (Any amount
> of help for unresponsive/stuck *interactive* sessions, but that doesn't
> help me!).
>
> Fingers crossed someone here can throw me a line....
>
> RossW
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20160222/ec2297fd/attachment.html>