19972013Ericsson AB. All Rights Reserved.
The contents of this file are subject to the Erlang Public License,
Version 1.1, (the "License"); you may not use this file except in
compliance with the License. You should have received a copy of the
Erlang Public License along with this software. If not, it can be
retrieved online at http://www.erlang.org/.
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
the License for the specific language governing rights and limitations
under the License.
Mnesia System InformationClaes Wikström, Hans Nilsson and Håkan MattssonMnesia_chap7.xmlDatabase Configuration Data
The following two functions can be used to retrieve system
information. They are described in detail in the reference manual.
mnesia:table_info(Tab, Key) ->Info | exit({aborted, Reason}).
Returns information about one table. Such as the
current size of the table, on which nodes it resides etc.
mnesia:system_info(Key) -> Info | exit({aborted, Reason}).
Returns information about the Mnesia system. For example, transaction
statistics, db_nodes, configuration parameters etc.
Core Dumps
If Mnesia malfunctions, system information is dumped to a file
named MnesiaCore.Node.When. The type of system
information contained in this file can also be generated with
the function mnesia_lib:coredump(). If a Mnesia system
behaves strangely, it is recommended that a Mnesia core dump
file be included in the bug report.
Dumping Tables
Tables of type ram_copies are by definition stored in
memory only. It is possible, however, to dump these tables to
disc, either at regular intervals, or before the system is
shutdown. The function mnesia:dump_tables(TabList) dumps
all replicas of a set of RAM tables to disc. The tables can be
accessed while being dumped to disc. To dump the tables to
disc all replicas must have the storage type ram_copies.
The table content is placed in a .DCD file on the
disc. When the Mnesia system is started, the RAM table will
initially be loaded with data from its .DCD file.
Checkpoints
A checkpoint is a transaction consistent state that spans over
one or more tables. When a checkpoint is activated, the system
will remember the current content of the set of tables. The
checkpoint retains a transaction consistent state of the tables,
allowing the tables to be read and updated while the checkpoint
is active. A checkpoint is typically used to
back up tables to external media, but they are also used
internally in Mnesia for other purposes. Each checkpoint is
independent and a table may be involved in several checkpoints
simultaneously.
Each table retains its old contents in a checkpoint retainer
and for performance critical applications, it may be important
to realize the processing overhead associated with checkpoints.
In a worst case scenario, the checkpoint retainer will consume
even more memory than the table itself. Each update will also be
slightly slower on those nodes where checkpoint
retainers are attached to the tables.
For each table it is possible to choose if there should be one
checkpoint retainer attached to all replicas of the table, or if
it is enough to have only one checkpoint retainer attached to a
single replica. With a single checkpoint retainer per table, the
checkpoint will consume less memory, but it will be vulnerable
to node crashes. With several redundant checkpoint retainers the
checkpoint will survive as long as there is at least one active
checkpoint retainer attached to each table.
Checkpoints may be explicitly deactivated with the function
mnesia:deactivate_checkpoint(Name), where Name is
the name of an active checkpoint. This function returns
ok if successful, or {error, Reason} in the case
of an error. All tables in a checkpoint must be attached to at
least one checkpoint retainer. The checkpoint is automatically
de-activated by Mnesia, when any table lacks a checkpoint
retainer. This may happen when a node goes down or when a
replica is deleted. Use the min and
max arguments described below, to control the degree of
checkpoint retainer redundancy.
Checkpoints are activated with the function mnesia:activate_checkpoint(Args),
where Args is a list of the following tuples:
{name,Name}. Name specifies a temporary name
of the checkpoint. The name may be re-used when the checkpoint
has been de-activated. If no name is specified, a name is
generated automatically.
{max,MaxTabs}. MaxTabs is a list of tables
which will be included in the checkpoint. The default is
[] (an empty list). For these tables, the redundancy
will be maximized. The old contents of the table will be
retained in the checkpoint retainer when the main table is
updated by the applications. The checkpoint becomes more fault
tolerant if the tables have several replicas. When new
replicas are added by means of the schema manipulation
function mnesia:add_table_copy/3, it will also
attach a local checkpoint retainer.
{min,MinTabs}. MinTabs is a list of tables
that should be included in the checkpoint. The default is
[]. For these tables, the redundancy will be minimized,
and there will be a single checkpoint retainer per table,
preferably at the local node.
{allow_remote,Bool}. false means that all
checkpoint retainers must be local. If a table does not reside
locally, the checkpoint cannot be activated. true
allows checkpoint retainers to be allocated on any node. The
defaults is true.
{ram_overrides_dump,Bool}. This argument only
applies to tables of type ram_copies. Bool
specifies if the table state in RAM should override the table
state on disc. true means that the latest committed
records in RAM are included in the checkpoint retainer. These
are the records that the application accesses. false
means that the records on the disc .DAT file are
included in the checkpoint retainer. These are the records
that will be loaded on start-up. Default is false.
The mnesia:activate_checkpoint(Args) returns one of the
following values:
{ok, Name, Nodes}{error, Reason}.
Name is the name of the checkpoint, and Nodes are
the nodes where the checkpoint is known.
A list of active checkpoints can be obtained with the following
functions:
mnesia:system_info(checkpoints). This function
returns all active checkpoints on the current node.mnesia:table_info(Tab,checkpoints). This function
returns active checkpoints on a specific table.Files
This section describes the internal files which are created and maintained by the Mnesia system,
in particular, the workings of the Mnesia log is described.
Start-Up Files
In Chapter 3 we detailed the following pre-requisites for
starting Mnesia (refer Chapter 3: Starting Mnesia:
We must start an Erlang session and specify a Mnesia
directory for our database.
We must initiate a database schema, using the function
mnesia:create_schema/1.
The following example shows how these tasks are performed:
% erl -sname klacke -mnesia dir '"/ldisc/scratch/klacke"'
Erlang (BEAM) emulator version 4.9
Eshell V4.9 (abort with ^G)
(klacke@gin)1> mnesia:create_schema([node()]).
ok
(klacke@gin)2>
^Z
Suspended
We can inspect the Mnesia directory to see what files have been created. Enter the following command:
% ls -l /ldisc/scratch/klacke
-rw-rw-r-- 1 klacke staff 247 Aug 12 15:06 FALLBACK.BUP
The response shows that the file FALLBACK.BUP has been created. This is called a backup file, and it contains an initial schema. If we had specified more than one node in the mnesia:create_schema/1 function, identical backup files would have been created on all nodes.
Continue by starting Mnesia:
(klacke@gin)3>mnesia:start( ).
ok
We can now see the following listing in the Mnesia directory:
-rw-rw-r-- 1 klacke staff 86 May 26 19:03 LATEST.LOG
-rw-rw-r-- 1 klacke staff 34507 May 26 19:03 schema.DAT
The schema in the backup file FALLBACK.BUP has been used to generate the file schema.DAT. Since we have no other disc resident tables than the schema, no other data files were created. The file FALLBACK.BUP was removed after the successful "restoration". We also see a number of files that are for internal use by Mnesia.
We can now see the following listing in the Mnesia directory:
% ls -l /ldisc/scratch/klacke
-rw-rw-r-- 1 klacke staff 86 May 26 19:07 LATEST.LOG
-rw-rw-r-- 1 klacke staff 94 May 26 19:07 foo.DCD
-rw-rw-r-- 1 klacke staff 6679 May 26 19:07 schema.DAT
Where a file foo.DCD has been created. This file will eventually store
all data that is written into the foo table.
The Log File
When starting Mnesia, a .LOG file called LATEST.LOG
was created and placed in the database directory. This file is
used by Mnesia to log disc based transactions. This includes all
transactions that write at least one record in a table which is
of storage type disc_copies, or
disc_only_copies. It also includes all operations which
manipulate the schema itself, such as creating new tables. The
format of the log can vary with different implementations of
Mnesia. The Mnesia log is currently implemented with the
standard library module disc_log.
The log file will grow continuously and must be dumped at
regular intervals. "Dumping the log file" means that Mnesia will
perform all the operations listed in the log and place the
records in the corresponding .DAT, .DCD and .DCL data files. For
example, if the operation "write record {foo, 4, elvis, 6}"
is listed in the log, Mnesia inserts the operation into the
file foo.DCL, later when Mnesia thinks the .DCL has become to large
the data is moved to the .DCD file.
The dumping operation can be time consuming
if the log is very large. However, it is important to realize
that the Mnesia system continues to operate during log dumps.
By default Mnesia either dumps the log whenever 100 records have
been written in the log or when 3 minutes have passed.
This is controlled by the two application parameters
-mnesia dump_log_write_threshold WriteOperations and
-mnesia dump_log_time_threshold MilliSecs.
Before the log is dumped, the file LATEST.LOG is
renamed to PREVIOUS.LOG, and a new LATEST.LOG file
is created. Once the log has been successfully dumped, the file
PREVIOUS.LOG is deleted.
The log is also dumped at start-up and whenever a schema
operation is performed.
The Data Files
The directory listing also contains one .DAT file. This contain
the schema itself, contained in the schema.DAT
file. The DAT files are indexed files, and it is efficient to
insert and search for records in these files with a specific
key. The .DAT files are used for the schema and for disc_only_copies
tables. The Mnesia data files are currently implemented with the
standard library module dets, and all operations which
can be performed on dets files can also be performed on
the Mnesia data files. For example, dets contains a
function dets:traverse/2 which can be used to view the
contents of a Mnesia DAT file. However, this can only be done
when Mnesia is not running. So, to view a our schema file, we
can:
Refer to the Reference Manual, std_lib for information about dets.
The DAT files must always be opened with the {repair, false}
option. This ensures that these files are not
automatically repaired. Without this option, the database may
become inconsistent, because Mnesia may
believe that the files were properly closed. Refer to the reference
manual for information about the configuration parameter
auto_repair.
It is recommended that Data files are not tampered with while Mnesia is
running. While not prohibited, the behavior of Mnesia is unpredictable.
The disc_copies tables are stored on disk with .DCL and .DCD files,
which are standard disk_log files.
Loading of Tables at Start-up
At start-up Mnesia loads tables in order to make them accessible
for its applications. Sometimes Mnesia decides to load all tables
that reside locally, and sometimes the tables may not be
accessible until Mnesia brings a copy of the table
from another node.
To understand the behavior of Mnesia at start-up it is
essential to understand how Mnesia reacts when it loses contact
with Mnesia on another node. At this stage, Mnesia cannot distinguish
between a communication failure and a "normal" node down.
When this happens, Mnesia will assume that the other node is no longer running.
Whereas, in reality, the communication between the nodes has merely failed.
To overcome this situation, simply try to restart the ongoing transactions that are
accessing tables on the failing node, and write a mnesia_down entry to a log file.
At start-up, it must be noted that all tables residing on nodes
without a mnesia_down entry, may have fresher replicas.
Their replicas may have been updated after the termination
of Mnesia on the current node. In order to catch up with the latest
updates, transfer a copy of the table from one of these other
"fresh" nodes. If you are unlucky, other nodes may be down
and you must wait for the table to be
loaded on one of these nodes before receiving a fresh copy of
the table.
Before an application makes its first access to a table,
mnesia:wait_for_tables(TabList, Timeout) ought to be executed
to ensure that the table is accessible from the local node. If
the function times out the application may choose to force a
load of the local replica with
mnesia:force_load_table(Tab) and deliberately lose all
updates that may have been performed on the other nodes while
the local node was down. If
Mnesia already has loaded the table on another node or intends
to do so, we will copy the table from that node in order to
avoid unnecessary inconsistency.
Keep in mind that it is only
one table that is loaded by mnesia:force_load_table(Tab)
and since committed transactions may have caused updates in
several tables, the tables may now become inconsistent due to
the forced load.
The allowed AccessMode of a table may be defined to
either be read_only or read_write. And it may be
toggled with the function mnesia:change_table_access_mode(Tab, AccessMode) in runtime. read_only tables and
local_content tables will always be loaded locally, since
there are no need for copying the table from other nodes. Other
tables will primary be loaded remotely from active replicas on
other nodes if the table already has been loaded there, or if
the running Mnesia already has decided to load the table there.
At start up, Mnesia will assume that its local replica is the
most recent version and load the table from disc if either
situation is detected:
mnesia_down is returned from all other nodes that holds a disc
resident replica of the table; or,if all replicas are ram_copies
This is normally a wise decision, but it may turn out to
be disastrous if the nodes have been disconnected due to a
communication failure, since Mnesia's normal table load
mechanism does not cope with communication failures.
When Mnesia is loading many tables the default load
order. However, it is possible to
affect the load order by explicitly changing the
load_order property for the tables, with the function
mnesia:change_table_load_order(Tab, LoadOrder). The
LoadOrder is by default 0 for all tables, but it
can be set to any integer. The table with the highest
load_order will be loaded first. Changing load order is
especially useful for applications that need to ensure early
availability of fundamental tables. Large peripheral
tables should have a low load order value, perhaps set
below 0.
Recovery from Communication Failure
There are several occasions when Mnesia may detect that the
network has been partitioned due to a communication failure.
One is when Mnesia already is up and running and the Erlang
nodes gain contact again. Then Mnesia will try to contact Mnesia
on the other node to see if it also thinks that the network has
been partitioned for a while. If Mnesia on both nodes has logged
mnesia_down entries from each other, Mnesia generates a
system event, called {inconsistent_database, running_partitioned_network, Node} which is sent to Mnesia's
event handler and other possible subscribers. The default event
handler reports an error to the error logger.
Another occasion when Mnesia may detect that the network has
been partitioned due to a communication failure, is at start-up.
If Mnesia detects that both the local node and another node received
mnesia_down from each other it generates a
{inconsistent_database, starting_partitioned_network, Node} system event and acts as described above.
If the application detects that there has been a communication
failure which may have caused an inconsistent database, it may
use the function mnesia:set_master_nodes(Tab, Nodes) to
pinpoint from which nodes each table may be loaded.
At start-up Mnesia's normal table load algorithm will be
bypassed and the table will be loaded from one of the master
nodes defined for the table, regardless of potential
mnesia_down entries in the log. The Nodes may only
contain nodes where the table has a replica and if it is empty,
the master node recovery mechanism for the particular table will
be reset and the normal load mechanism will be used when next
restarting.
The function mnesia:set_master_nodes(Nodes) sets master
nodes for all tables. For each table it will determine its
replica nodes and invoke mnesia:set_master_nodes(Tab, TabNodes) with those replica nodes that are included in the
Nodes list (i.e. TabNodes is the intersection of
Nodes and the replica nodes of the table). If the
intersection is empty the master node recovery mechanism for the
particular table will be reset and the normal load mechanism
will be used at next restart.
The functions mnesia:system_info(master_node_tables) and
mnesia:table_info(Tab, master_nodes) may be used to
obtain information about the potential master nodes.
Determining which data to keep after communication failure is outside
the scope of Mnesia. One approach would be to determine which "island"
contains a majority of the nodes. Using the {majority,true} option
for critical tables can be a way of ensuring that nodes that are not part
of a "majority island" are not able to update those tables. Note that this
constitutes a reduction in service on the minority nodes. This would be
a tradeoff in favour of higher consistency guarantees.
The function mnesia:force_load_table(Tab) may be used to
force load the table regardless of which table load mechanism
is activated.
Recovery of Transactions
A Mnesia table may reside on one or more nodes. When a table is
updated, Mnesia will ensure that the updates will be replicated
to all nodes where the table resides. If a replica happens to be
inaccessible for some reason (e.g. due to a temporary node down),
Mnesia will then perform the replication later.
On the node where the application is started, there will be a
transaction coordinator process. If the transaction is
distributed, there will also be a transaction participant process on
all the other nodes where commit work needs to be performed.
Internally Mnesia uses several commit protocols. The selected
protocol depends on which table that has been updated in
the transaction. If all the involved tables are symmetrically
replicated, (i.e. they all have the same ram_nodes,
disc_nodes and disc_only_nodes currently
accessible from the coordinator node), a lightweight transaction
commit protocol is used.
The number of messages that the
transaction coordinator and its participants needs to exchange
is few, since Mnesia's table load mechanism takes care of the
transaction recovery if the commit protocol gets
interrupted. Since all involved tables are replicated
symmetrically the transaction will automatically be recovered by
loading the involved tables from the same node at start-up of a
failing node. We do not really care if the transaction was
aborted or committed as long as we can ensure the ACID
properties. The lightweight commit protocol is non-blocking,
i.e. the surviving participants and their coordinator will
finish the transaction, regardless of some node crashes in the
middle of the commit protocol or not.
If a node goes down in the middle of a dirty operation the
table load mechanism will ensure that the update will be
performed on all replicas or none. Both asynchronous dirty
updates and synchronous dirty updates use the same recovery
principle as lightweight transactions.
If a transaction involves updates of asymmetrically replicated
tables or updates of the schema table, a heavyweight commit
protocol will be used. The heavyweight commit protocol is able
to finish the transaction regardless of how the tables are
replicated. The typical usage of a heavyweight transaction is
when we want to move a replica from one node to another. Then we
must ensure that the replica either is entirely moved or left as
it was. We must never end up in a situation with replicas on both
nodes or no node at all. Even if a node crashes in the middle of
the commit protocol, the transaction must be guaranteed to be
atomic. The heavyweight commit protocol involves more messages
between the transaction coordinator and its participants than
a lightweight protocol and it will perform recovery work at
start-up in order to finish the abort or commit work.
The heavyweight commit protocol is also non-blocking,
which allows the surviving participants and their coordinator to
finish the transaction regardless (even if a node crashes in the
middle of the commit protocol). When a node fails at start-up,
Mnesia will determine the outcome of the transaction and
recover it. Lightweight protocols, heavyweight protocols and dirty updates, are
dependent on other nodes to be up and running in order to make the
correct heavyweight transaction recovery decision.
If Mnesia has not started on some of the nodes that are involved in the
transaction AND neither the local node or any of the already
running nodes know the outcome of the transaction, Mnesia will
by default wait for one. In the worst case scenario all other
involved nodes must start before Mnesia can make the correct decision
about the transaction and finish its start-up.
This means that Mnesia (on one node)may hang if a double fault occurs, i.e. when two nodes crash simultaneously
and one attempts to start when the other refuses to
start e.g. due to a hardware error.
It is possible to specify the maximum time that Mnesia
will wait for other nodes to respond with a transaction
recovery decision. The configuration parameter
max_wait_for_decision defaults to infinity (which may
cause the indefinite hanging as mentioned above) but if it is
set to a definite time period (eg.three minutes), Mnesia will then enforce a
transaction recovery decision if needed, in order to allow
Mnesia to continue with its start-up procedure.
The downside of an enforced transaction recovery decision, is that the decision may be
incorrect, due to insufficient information regarding the other nodes'
recovery decisions. This may result in an
inconsistent database where Mnesia has committed the transaction
on some nodes but aborted it on others.
In fortunate cases the inconsistency will only appear in tables belonging to a specific
application, but if a schema transaction has been inconsistently
recovered due to the enforced transaction recovery decision, the
effects of the inconsistency can be fatal.
However, if the higher priority is availability rather than
consistency, then it may be worth the risk.
If Mnesia
encounters a inconsistent transaction decision a
{inconsistent_database, bad_decision, Node} system event
will be generated in order to give the application a chance to
install a fallback or other appropriate measures to resolve the inconsistency. The default
behavior of the Mnesia event handler is the same as if the
database became inconsistent as a result of partitioned network (see
above).
Backup, Fallback, and Disaster Recovery
The following functions are used to backup data, to install a
backup as fallback, and for disaster recovery.
mnesia:backup_checkpoint(Name, Opaque, [Mod]). This
function performs a backup of the tables included in the
checkpoint.
mnesia:backup(Opaque, [Mod]). This function
activates a new checkpoint which covers all Mnesia tables and
performs a backup. It is performed with maximum degree of
redundancy (also refer to the function mnesia:activate_checkpoint(Args),
{max, MaxTabs} and {min, MinTabs}).mnesia:traverse_backup(Source,[SourceMod,]Target,[TargetMod,]Fun,Ac). This function can be used
to read an existing backup, create a new backup from an
existing one, or to copy a backup from one type media to
another.
mnesia:uninstall_fallback(). This function removes
previously installed fallback files.
mnesia:restore(Opaque, Args). This function
restores a set of tables from a previous backup.
mnesia:install_fallback(Opaque, [Mod]). This
function can be configured to restart the Mnesia and reload data
tables, and possibly schema tables, from an existing
backup. This function is typically used for disaster recovery
purposes, when data or schema tables are corrupted.
These functions are explained in the following
sub-sections. Also refer to the the section Checkpoints in this chapter, which
describes the two functions used to activate and de-activate
checkpoints.
Backup
Backup operation are performed with the following functions:
By default, the actual access to the backup media is
performed via the mnesia_backup module for both read
and write. Currently mnesia_backup is implemented with
the standard library module disc_log, but it is possible to write
your own module with the same interface as
mnesia_backup and configure Mnesia so the alternate
module performs the actual accesses to the backup media. This
means that the user may put the backup on medias that Mnesia
does not know about, possibly on hosts where Erlang is not
running. Use the configuration parameter ]]> for this purpose.
The source
for a backup is an activated checkpoint. The backup function
most commonly used is mnesia:backup_checkpoint(Name, Opaque,[Mod]). This function returns either ok, or
{error,Reason}. It has the following arguments:
Name is the name of an activated
checkpoint. Refer to the section Checkpoints in this chapter, the
function mnesia:activate_checkpoint(ArgList) for
details on how to include table names in checkpoints.
Opaque. Mnesia does not interpret this argument,
but it is forwarded to the backup module. The Mnesia default
backup module, mnesia_backup interprets this argument
as a local file name.
Mod. The name of an alternate backup module.
The function mnesia:backup(Opaque[, Mod]) activates a
new checkpoint which covers all Mnesia tables with maximum
degree of redundancy and performs a backup. Maximum
redundancy means that each table replica has a checkpoint
retainer. Tables with the local_contents property are
backed up as they
look on the current node.
It is possible to iterate over a backup, either for the
purpose of transforming it into a new backup, or just reading
it. The function mnesia:traverse_backup(Source, [SourceMod,]Target, [TargeMod,] Fun, Acc) which normally returns {ok, LastAcc}, is used for both of these purposes.
Before the traversal starts, the source backup media is
opened with SourceMod:open_read(Source), and the target
backup media is opened with
TargetMod:open_write(Target). The arguments are:
SourceMod and TargetMod are module names.
Source and Target are opaque data used
exclusively by the modules SourceMod and
TargetMod for the purpose of initializing the backup
medias.
Acc is an initial accumulator value.
Fun(BackupItems, Acc) is applied to each item in
the backup. The Fun must return a tuple {ValGoodBackupItems, NewAcc}, where ValidBackupItems is a list of valid
backup items, and NewAcc is a new accumulator value.
The ValidBackupItems are written to the target backup
with the function TargetMod:write/2.
LastAcc is the last accumulator value. I.e.
the last NewAcc value that was returned by Fun.
It is also possible to perform a read-only traversal of the
source backup without updating a target backup. If
TargetMod==read_only, then no target backup is accessed
at all.
By setting SourceMod and TargetMod to different
modules it is possible to copy a backup from one kind of backup
media to another.
Valid BackupItems are the following tuples:
{schema, Tab} specifies a table to be deleted.
{schema, Tab, CreateList} specifies a table to be
created. See mnesia_create_table/2 for more
information about CreateList.
{Tab, Key} specifies the full identity of a record
to be deleted.
{Record} specifies a record to be inserted. It
can be a tuple with Tab as first field. Note that the
record name is set to the table name regardless of what
record_name is set to.
The backup data is divided into two sections. The first
section contains information related to the schema. All schema
related items are tuples where the first field equals the atom
schema. The second section is the record section. It is not
possible to mix schema records with other records and all schema
records must be located first in the backup.
The schema itself is a table and will possibly be included in
the backup. All nodes where the schema table resides are
regarded as a db_node.
The following example illustrates how
mnesia:traverse_backup can be used to rename a db_node in
a backup file:
Restore
Tables can be restored on-line from a backup without
restarting Mnesia. A restore is performed with the function
mnesia:restore(Opaque,Args), where Args can
contain the following tuples:
{module,Mod}. The backup module Mod is
used to access the backup media. If omitted, the default
backup module will be used.{skip_tables, TableList} Where TableList
is a list of tables which should not be read from the backup.{clear_tables, TableList} Where TableList
is a list of tables which should be cleared, before the
records from the backup are inserted, i.e. all records in
the tables are deleted before the tables are restored.
Schema information about the tables is not cleared or read
from backup.{keep_tables, TableList} Where TableList
is a list of tables which should be not be cleared, before
the records from the backup are inserted, i.e. the records
in the backup will be added to the records in the table.
Schema information about the tables is not cleared or read
from backup.{recreate_tables, TableList} Where TableList
is a list of tables which should be re-created, before the
records from the backup are inserted. The tables are first
deleted and then created with the schema information from the
backup. All the nodes in the backup needs to be up and running.{default_op, Operation} Where Operation is
one of the following operations skip_tables,
clear_tables, keep_tables or
recreate_tables. The default operation specifies
which operation should be used on tables from the backup
which are not specified in any of the lists above.
If omitted, the operation clear_tables will be used.
The argument Opaque is forwarded to the backup module.
It returns {atomic, TabList} if successful, or the
tuple {aborted, Reason} in the case of an error.
TabList is a list of the restored tables. Tables which
are restored are write locked for the duration of the restore
operation. However, regardless of any lock conflict caused by
this, applications can continue to do their work during the
restore operation.
The restoration is performed as a single transaction. If the
database is very large, it may not be possible to restore it
online. In such a case the old database must be restored by
installing a fallback, and then restart.
Fallbacks
The function mnesia:install_fallback(Opaque, [Mod]) is
used to install a backup as fallback. It uses the backup module
Mod, or the default backup module, to access the backup
media. This function returns ok if successful, or
{error, Reason} in the case of an error.
Installing a fallback is a distributed operation that is
only performed on all db_nodes. The fallback
is used to restore the database the next time the system is
started. If a Mnesia node with a fallback installed detects that
Mnesia on another node has died for some reason, it will
unconditionally terminate itself.
A fallback is typically used when a system upgrade is
performed. A system typically involves the installation of new
software versions, and Mnesia tables are often transformed into
new layouts. If the system crashes during an upgrade, it is
highly probable re-installation of the old
applications will be required and restoration of the database
to its previous state. This can be done if a backup is performed and
installed as a fallback before the system upgrade begins.
If the system upgrade fails, Mnesia must be restarted on all
db_nodes in order to restore the old database. The
fallback will be automatically de-installed after a successful
start-up. The function mnesia:uninstall_fallback() may
also be used to de-install the fallback after a
successful system upgrade. Again, this is a distributed
operation that is either performed on all db_nodes, or
none. Both the installation and de-installation of fallbacks
require Erlang to be up and running on all db_nodes, but
it does not matter if Mnesia is running or not.
Disaster Recovery
The system may become inconsistent as a result of a power
failure. The UNIX fsck feature can possibly repair the
file system, but there is no guarantee that the file contents
will be consistent.
If Mnesia detects that a file has not been properly closed,
possibly as a result of a power failure, it will attempt to
repair the bad file in a similar manner. Data may be lost, but
Mnesia can be restarted even if the data is inconsistent. The
configuration parameter ]]> can be
used to control the behavior of Mnesia at start-up. If
]]> has the value true, Mnesia will attempt to
repair the file; if ]]> has the value false,
Mnesia will not restart if it detects a suspect file. This
configuration parameter affects the repair behavior of log
files, DAT files, and the default backup media.
The configuration parameter ]]> controls the safety level of
the mnesia:dump_log() function. By default, Mnesia will
dump the transaction log directly into the DAT files. If a power
failure happens during the dump, this may cause the randomly
accessed DAT files to become corrupt. If the parameter is set to
false, Mnesia will copy the DAT files and target the dump
to the new temporary files. If the dump is successful, the
temporary files will be renamed to their normal DAT
suffixes. The possibility for unrecoverable inconsistencies in
the data files will be much smaller with this strategy. On the
other hand, the actual dumping of the transaction log will be
considerably slower. The system designer must decide whether
speed or safety is the higher priority.
Replicas of type disc_only_copies will only be
affected by this parameter during the initial dump of the log
file at start-up. When designing applications which have
very high requirements, it may be appropriate not to
use disc_only_copies tables at all. The reason for this
is the random access nature of normal operating system files. If
a node goes down for reason for a reason such as a power
failure, these files may be corrupted because they are not
properly closed. The DAT files for disc_only_copies are
updated on a per transaction basis.
If a disaster occurs and the Mnesia database has been
corrupted, it can be reconstructed from a backup. This should be
regarded as a last resort, since the backup contains old data. The
data is hopefully consistent, but data will definitely be lost
when an old backup is used to restore the database.