|
Mnesia currently notifies the user if it detects a partitioned
network, but the options for resolving the situation are limited.
In practice, the only safe options are:
- set master_nodes and restart one of the affected 'islands'
- restart the entire system from backup
This patch introduces a way to resolve the situation without
restarting any nodes. The key to doing this safely is to
lock affected tables and run the merge function inside the same
transaction that merges the schema. Otherwise, one transaction
will merge the schema, after which writes to the database will
be replicated across the (potentially) inconsistent copies;
the transaction triggered by the asynchronous inconsistency event
will have to race to be the first to access the tables.
The normal call to merge the schema is done from mnesia_controller.
Previously, this was mnesia_schema:merge_schema().
The new function is merge_schema(UserFun), with the
following behaviour:
merge_schema(UserFun) ->
schema_transaction(
fun() ->
UserFun(fun(Arg) -> do_merge_schema(Arg) end)
end).
Where do_merge_schema(LockTabs) will execute the schema merge
as before, but also lock all tables in the list LockTabs which
have copies on the affected nodes (that is, everywhere the schema
table is locked).
The effect of this is to allow a wrapper function that calls the
merge and, if successful, continues to resolve the inconsistency
on the tables, knowing that they have now been locked on all
affected nodes.
The function that is actually called by the deconflict function
is mnesia_controller:connect_nodes(Nodes, UserFun), as in:
Tables = tables_to_deconflict(Node),
mnesia_controller:connect_nodes(
[Node], fun(MergeF) ->
case MergeF(Tables) of
{merged,_,_} ->
deconflict(Tables, Node);
Other ->
Other
end).
In the case where the merge fails, it is probably wise to
restart from a backup...
I have not run the mnesia test suite, as it is not available.
I have not updated documentation, as these functions are not
documented in the first place.
|