From 9237801d22a38d2643ffe94ab626c4d2815012dd Mon Sep 17 00:00:00 2001 From: Dan Gudmundsson Date: Thu, 10 Oct 2013 10:55:46 +0200 Subject: mnesia: Synchronize lock cleanup after mnesia down Bad timing could lead to hanging transactions after a mnesia down from a node with sticky locks. Excellent bug report from janchochol Situation: * node A and B have copies of table T * node A ows sticky of table T * node A goes down (e.g. crash) * node B tries to perform transactional operation on table T (e.g. mnesia:select) In this situation there is possibility that first (and maybe other) transaction on node B will hang indefinitely. This is caused by race condition, when transaction process send lock request operation to node A and waits for reply. When node A is down it will never send reply, so process on node B will be stuck forever. Reason is that message sent to mnesia_locker gen_server from mnesia_locker:mnesia_down can be received after mnesia_locker gen_server already replies to transaction processes with {switch, N, Req} and node N is down. Monitoring remote process when sending request to other node should be safe solution. --- lib/mnesia/src/mnesia_tm.erl | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'lib/mnesia/src/mnesia_tm.erl') diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl index e54e5c4e88..17af0cad44 100644 --- a/lib/mnesia/src/mnesia_tm.erl +++ b/lib/mnesia/src/mnesia_tm.erl @@ -181,7 +181,7 @@ mnesia_down(Node) -> %% mnesia_monitor takes care of the sync case whereis(?MODULE) of undefined -> - mnesia_monitor:mnesia_down(?MODULE, {Node, []}); + mnesia_monitor:mnesia_down(?MODULE, Node); Pid -> Pid ! {mnesia_down, Node} end. @@ -403,7 +403,9 @@ doit_loop(#state{coordinators=Coordinators,participants=Participants,supervisor= Tids = gb_trees:keys(Participants), reconfigure_participants(N, gb_trees:values(Participants)), NewState = clear_fixtable(N, State), - mnesia_monitor:mnesia_down(?MODULE, {N, Tids}), + + mnesia_locker:mnesia_down(N, Tids), + mnesia_monitor:mnesia_down(?MODULE, N), doit_loop(NewState); {From, {unblock_me, Tab}} -> -- cgit v1.2.3