[AusNOG] Detecting "hung" ssh sessions.
Ross Wheeler
ausnog at rossw.net
Mon Feb 22 14:07:41 EST 2016
Hi Noggers.
Looking for a "bright idea" or a point in the right direction.
I have a bunch of remote devices that live behind nat and firewalls, in
uncontrolled environments. It's not always (or even frequently) possible
to get those in charge of said NAT boxes to do PAT to my devices, so
instead I have each device ssh to one of my hosts and create a reverse
tunnel. (The tunnels are bound only to the loopback interface on my
server, so the end devices are not significantly exposed to the outside
world).
As and when I need to access remote boxes, I ssh to the terminating host,
ssh to the appropriate port and have immediate shell access on the remote
box.
Each remote box also periodically (cron) checks that the ssh session is
(still) running (simple ps) and (re)starts it if not.
This generally works well.
Alas, this morning, my provider had a brief oopsee (no explanation
forthcoming) where 100% of my external connectivity dropped for a few
minutes.
This resulted in every last one of these tunnels breaking, but they've
broken in such a way that they didn't restart. The terminating host shows
no connections from any of the remote devices, yet all of the remote
devices still have their ssh session running. They simply are not passing
any traffic. Yes, I have keepalives enabled.
Does anyone know of a simple, effective, reliable way to detect (from the
client end) the loss of end-to-end function of a tunnel like this without
going completely overboard - installing replacement versions of ssh isn't
going to work for me, running autossh similarly.
Things I've looked at but lucked out with include adding a static route to
my server and looking for either byte counters or last-used timers with
netstat, looking for per-process traffic or tcp counters and a number of
other failed avenues.
I could add ipfw and pass traffic through a rule to observe if its passing
traffic or not, but that has lots of other negative impacts, especially on
a few machines that are already balls-to-the-wall on their network
interfaces.
I considered tcpdump to see when a packet was last received, but it too
has lots of other overheads.
I'm sure I'm not the only person to have ever faced this, lots of people
will have overcome it, but I can't find any information on it. (Any amount
of help for unresponsive/stuck *interactive* sessions, but that doesn't
help me!).
Fingers crossed someone here can throw me a line....
RossW
More information about the AusNOG
mailing list