[AusNOG] Detecting "hung" ssh sessions.

Mon Feb 22 14:12:08 EST 2016

Hi Ross,

Perhaps something like mosh would work for this? https://mosh.mit.edu/

I don't use it for this situation specifically, but I run it on all of 
my servers. It is great for connecting from a laptop, you can put it in 
standby without closing your sessions, when you start back up again you 
are still connected. Same with if you change IP's (eg. move from wifi to 
a tethered connection), none of your sessions will drop.

On 22/02/2016 11:07 AM, Ross Wheeler wrote:
>
> Hi Noggers.
>
> Looking for a "bright idea" or a point in the right direction.
>
> I have a bunch of remote devices that live behind nat and firewalls, 
> in uncontrolled environments. It's not always (or even frequently) 
> possible to get those in charge of said NAT boxes to do PAT to my 
> devices, so instead I have each device ssh to one of my hosts and 
> create a reverse tunnel. (The tunnels are bound only to the loopback 
> interface on my server, so the end devices are not significantly 
> exposed to the outside world).
>
> As and when I need to access remote boxes, I ssh to the terminating 
> host, ssh to the appropriate port and have immediate shell access on 
> the remote box.
>
> Each remote box also periodically (cron) checks that the ssh session 
> is (still) running (simple ps) and (re)starts it if not.
>
> This generally works well.
>
> Alas, this morning, my provider had a brief oopsee (no explanation 
> forthcoming) where 100% of my external connectivity dropped for a few 
> minutes.
>
> This resulted in every last one of these tunnels breaking, but they've 
> broken in such a way that they didn't restart. The terminating host 
> shows no connections from any of the remote devices, yet all of the 
> remote devices still have their ssh session running. They simply are 
> not passing any traffic. Yes, I have keepalives enabled.
>
> Does anyone know of a simple, effective, reliable way to detect (from 
> the client end) the loss of end-to-end function of a tunnel like this 
> without going completely overboard - installing replacement versions 
> of ssh isn't going to work for me, running autossh similarly.
>
> Things I've looked at but lucked out with include adding a static 
> route to my server and looking for either byte counters or last-used 
> timers with netstat, looking for per-process traffic or tcp counters 
> and a number of other failed avenues.
>
> I could add ipfw and pass traffic through a rule to observe if its 
> passing traffic or not, but that has lots of other negative impacts, 
> especially on a few machines that are already balls-to-the-wall on 
> their network interfaces.
>
> I considered tcpdump to see when a packet was last received, but it 
> too has lots of other overheads.
>
> I'm sure I'm not the only person to have ever faced this, lots of 
> people will have overcome it, but I can't find any information on it. 
> (Any amount of help for unresponsive/stuck *interactive* sessions, but 
> that doesn't help me!).
>
> Fingers crossed someone here can throw me a line....
>
> RossW
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog