From 8e6d005dcd8f62745d28d1820c4b2a5025634f5f Mon Sep 17 00:00:00 2001 From: Alexey Lebedeff Date: Mon, 14 Dec 2015 15:15:52 +0300 Subject: Prevent down nodes going undetected in epmd In the following (rare) case down node will be always registered in epmd: - client connects to epmd and sends ALIVE2 request - epmd reads this request and starts to process it - during that time client socket closes in such way that subsequent write(2) in epmd will result in error - at this point we have node that was registered in database, but as the connection struct has no 'keep' flag set, the do_read() closes connection and removes it from select fdset - and so there is no way for this node to be cleaned up later. We've seen several epmd instances in such state on our production systems. And while I'm not sure what was the exact sequence of events that leads to failed write(2), issue could be easily reproduced using SO_LINGER option for socket. --- erts/epmd/src/epmd_srv.c | 1 + 1 file changed, 1 insertion(+) (limited to 'erts/epmd/src') diff --git a/erts/epmd/src/epmd_srv.c b/erts/epmd/src/epmd_srv.c index 8c8d7304f2..c9d49e73d0 100644 --- a/erts/epmd/src/epmd_srv.c +++ b/erts/epmd/src/epmd_srv.c @@ -705,6 +705,7 @@ static void do_request(g, fd, s, buf, bsize) if (reply(g, fd, wbuf, 4) != 4) { + node_unreg(g, name); dbg_tty_printf(g,1,"** failed to send ALIVE2_RESP for \"%s\"", name); return; -- cgit v1.2.3