The earlier chapters of this User Guide described how to get started with Mnesia, and how to build a Mnesia database. In this chapter, we will describe the more advanced features available when building a distributed, fault tolerant Mnesia database. This chapter contains the following sections:
Data retrieval and matching can be performed very efficiently if we know the key for the record. Conversely, if the key is not known, all records in a table must be searched. The larger the table the more time consuming it will become. To remedy this problem Mnesia's indexing capabilities are used to improve data retrieval and matching of records.
The following two functions manipulate indexes on existing tables:
These functions create or delete a table index on field
defined by
The indexing capabilities of Mnesia are utilized with the following three functions, which retrieve and match records on the basis of index entries in the database.
These functions are further described and exemplified in
Chapter 4:
Mnesia is a distributed, fault tolerant DBMS. It is possible to replicate tables on different Erlang nodes in a variety of ways. The Mnesia programmer does not have to state where the different tables reside, only the names of the different tables are specified in the program code. This is known as "location transparency" and it is an important concept. In particular:
We have previously seen that each table has a number of
system attributes, such as
Table attributes are specified when the table is created. For example, the following function will create a new table with two RAM replicas:
mnesia:create_table(foo, [{ram_copies, [N1, N2]}, {attributes, record_info(fields, foo)}]).
Tables can also have the following properties, where each attribute has a list of Erlang nodes as its value.
It is also possible to set and change table properties on
existing tables. Refer to Chapter 3:
There are basically two reasons for using more than one table replica: fault tolerance, or speed. It is worthwhile to note that table replication provides a solution to both of these system requirements.
If we have two active table replicas, all information is still available if one of the replicas fail. This can be a very important property in many applications. Furthermore, if a table replica exists at two specific nodes, applications which execute at either of these nodes can read data from the table without accessing the network. Network operations are considerably slower and consume more resources than local operations.
It can be advantageous to create table replicas for a distributed application which reads data often, but writes data seldom, in order to achieve fast read operations on the local node. The major disadvantage with replication is the increased time to write data. If a table has two replicas, every write operation must access both table replicas. Since one of these write operations must be a network operation, it is considerably more expensive to perform a write operation to a replicated table than to a non-replicated table.
A concept of table fragmentation has been introduced in
order to cope with very large tables. The idea is to split a
table into several more manageable fragments. Each fragment
is implemented as a first class Mnesia table and may be
replicated, have indices etc. as any other table. But the
tables may neither have
In order to be able to access a record in a fragmented
table, Mnesia must determine to which fragment the
actual record belongs. This is done by the
At each record access
The following piece of code illustrates how an existing Mnesia table is converted to be a fragmented table and how more fragments are added later on.
mnesia:start().
ok
(a@sam)2> mnesia:system_info(running_db_nodes).
[b@sam,c@sam,a@sam]
(a@sam)3> Tab = dictionary.
dictionary
(a@sam)4> mnesia:create_table(Tab, [{ram_copies, [a@sam, b@sam]}]).
{atomic,ok}
(a@sam)5> Write = fun(Keys) -> [mnesia:write({Tab,K,-K}) || K <- Keys], ok end.
#Fun
(a@sam)6> mnesia:activity(sync_dirty, Write, [lists:seq(1, 256)], mnesia_frag).
ok
(a@sam)7> mnesia:change_table_frag(Tab, {activate, []}).
{atomic,ok}
(a@sam)8> mnesia:table_info(Tab, frag_properties).
[{base_table,dictionary},
{foreign_key,undefined},
{n_doubles,0},
{n_fragments,1},
{next_n_to_split,1},
{node_pool,[a@sam,b@sam,c@sam]}]
(a@sam)9> Info = fun(Item) -> mnesia:table_info(Tab, Item) end.
#Fun
(a@sam)10> Dist = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{c@sam,0},{a@sam,1},{b@sam,1}]
(a@sam)11> mnesia:change_table_frag(Tab, {add_frag, Dist}).
{atomic,ok}
(a@sam)12> Dist2 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{b@sam,1},{c@sam,1},{a@sam,2}]
(a@sam)13> mnesia:change_table_frag(Tab, {add_frag, Dist2}).
{atomic,ok}
(a@sam)14> Dist3 = mnesia:activity(sync_dirty, Info, [frag_dist], mnesia_frag).
[{a@sam,2},{b@sam,2},{c@sam,2}]
(a@sam)15> mnesia:change_table_frag(Tab, {add_frag, Dist3}).
{atomic,ok}
(a@sam)16> Read = fun(Key) -> mnesia:read({Tab, Key}) end.
#Fun
(a@sam)17> mnesia:activity(transaction, Read, [12], mnesia_frag).
[{dictionary,12,-12}]
(a@sam)18> mnesia:activity(sync_dirty, Info, [frag_size], mnesia_frag).
[{dictionary,64},
{dictionary_frag2,64},
{dictionary_frag3,64},
{dictionary_frag4,64}]
(a@sam)19>
]]>
There is a table property called
The node pool contains a list of nodes and may
explicitly be set at table creation and later be changed
with
Regulates how many
Regulates how many
Regulates how many
Enables definition of an alternate hashing scheme.
The module must implement the
Older tables that was created before the concept of
user defined hash modules was introduced, uses
the
Enables a table specific parameterization
of a generic hash module. This property may explicitly
be set at table creation.
The default is
mnesia:start().
ok
(a@sam)2> PrimProps = [{n_fragments, 7}, {node_pool, [node()]}].
[{n_fragments,7},{node_pool,[a@sam]}]
(a@sam)3> mnesia:create_table(prim_dict,
[{frag_properties, PrimProps},
{attributes,[prim_key,prim_val]}]).
{atomic,ok}
(a@sam)4> SecProps = [{foreign_key, {prim_dict, sec_val}}].
[{foreign_key,{prim_dict,sec_val}}]
(a@sam)5> mnesia:create_table(sec_dict,
\011 [{frag_properties, SecProps},
(a@sam)5> {attributes, [sec_key, sec_val]}]).
{atomic,ok}
(a@sam)6> Write = fun(Rec) -> mnesia:write(Rec) end.
#Fun
(a@sam)7> PrimKey = 11.
11
(a@sam)8> SecKey = 42.
42
(a@sam)9> mnesia:activity(sync_dirty, Write,
\011\011 [{prim_dict, PrimKey, -11}], mnesia_frag).
ok
(a@sam)10> mnesia:activity(sync_dirty, Write,
\011\011 [{sec_dict, SecKey, PrimKey}], mnesia_frag).
ok
(a@sam)11> mnesia:change_table_frag(prim_dict, {add_frag, [node()]}).
{atomic,ok}
(a@sam)12> SecRead = fun(PrimKey, SecKey) ->
\011\011 mnesia:read({sec_dict, PrimKey}, SecKey, read) end.
#Fun
(a@sam)13> mnesia:activity(transaction, SecRead,
\011\011 [PrimKey, SecKey], mnesia_frag).
[{sec_dict,42,11}]
(a@sam)14> Info = fun(Tab, Item) -> mnesia:table_info(Tab, Item) end.
#Fun
(a@sam)15> mnesia:activity(sync_dirty, Info,
\011\011 [prim_dict, frag_size], mnesia_frag).
[{prim_dict,0},
{prim_dict_frag2,0},
{prim_dict_frag3,0},
{prim_dict_frag4,1},
{prim_dict_frag5,0},
{prim_dict_frag6,0},
{prim_dict_frag7,0},
{prim_dict_frag8,0}]
(a@sam)16> mnesia:activity(sync_dirty, Info,
\011\011 [sec_dict, frag_size], mnesia_frag).
[{sec_dict,0},
{sec_dict_frag2,0},
{sec_dict_frag3,0},
{sec_dict_frag4,1},
{sec_dict_frag5,0},
{sec_dict_frag6,0},
{sec_dict_frag7,0},
{sec_dict_frag8,0}]
(a@sam)17>
]]>
The function
Activates the fragmentation properties of an
existing table.
Deactivates the fragmentation properties of a
table. The number of fragments must be
Adds one new fragment to a fragmented table. All records in one of the old fragments will be rehashed and about half of them will be moved to the new (last) fragment. All other fragmented tables, which refers to this table in their foreign key, will automatically get a new fragment, and their records will also be dynamically rehashed in the same manner as for the main table.
The
Deletes one fragment from a fragmented table. All records in the last fragment will be moved to one of the other fragments. All other fragmented tables which refers to this table in their foreign key, will automatically lose their last fragment and their records will also be dynamically rehashed in the same manner as for the main table.
Adds a new node to the
Deletes a new node from the
The function
The function
The function
If the function
the name of the fragmented table
the actual number of fragments
the pool of nodes
the number of replicas with storage type
the foreign key.
all other tables that refers to this table in their foreign key.
the names of all fragments.
a sorted list of
a list of
a list of
total size of all fragments
the total memory of all fragments
There are several algorithms for distributing records in a fragmented table evenly over a pool of nodes. No one is best, it simply depends of the application needs. Here follows some examples of situations which may need some attention:
Use
Replicated tables have the same content on all nodes where they are replicated. However, it is sometimes advantageous to have tables but different content on different nodes.
If we specify the attribute
Furthermore, when the table is initialized at start-up, the table will only be initialized locally, and the table content will not be copied from another node.
It is possible to run Mnesia on nodes that do not have a
disc. It is of course not possible to have replicas
of neither
The schema table may, as other tables, reside on one or
more nodes. The storage type of the schema table may either
be
Hence, when a disc-less node needs to find the schema
definitions from a remote node on the network, we need to supply
this information through the application parameter
The application parameter schema_location controls where Mnesia will search for its schema. The parameter may be one of the following atoms:
Mandatory disc. The schema is assumed to be located on the Mnesia directory. And if the schema cannot be found, Mnesia refuses to start.
Mandatory ram. The schema resides in ram
only. At start-up a tiny new schema is generated. This
default schema contains just the definition of the schema
table and only resides on the local node. Since no other
nodes are found in the default schema, the configuration
parameter
Optional disc. The schema may reside on either disc
or ram. If the schema is found on disc, Mnesia starts as a
disc-full node (the storage type of the schema table is
disc_copies). If no schema is found on disc, Mnesia starts
as a disc-less node (the storage type of the schema table is
ram_copies). The default value for the application parameter
is
When the
1> mnesia:start(). ok 2> mnesia:change_table_copy_type(schema, node(), disc_copies). {atomic, ok}
Assuming that the call to
It is possible to add and remove nodes from a Mnesia system. This can be done by adding a copy of the schema to those nodes.
The functions
The function call
If the storage type of the schema is ram_copies, i.e, we
have disc-less node, Mnesia
will not use the disc on that particular node. The disc
usage is enabled by changing the storage type of the table
New schemas are
created explicitly with
At start-up Mnesia connects different nodes to each other, then they exchange table definitions with each other and the table definitions are merged. During the merge procedure Mnesia performs a sanity test to ensure that the table definitions are compatible with each other. If a table exists on several nodes the cookie must be the same, otherwise Mnesia will shutdown one of the nodes. This unfortunate situation will occur if a table has been created on two nodes independently of each other while they were disconnected. To solve the problem, one of the tables must be deleted (as the cookies differ we regard it to be two different tables even if they happen to have the same name).
Merging different versions of the schema table, does not always require the cookies to be the same. If the storage type of the schema table is disc_copies, the cookie is immutable, and all other db_nodes must have the same cookie. When the schema is stored as type ram_copies, its cookie can be replaced with a cookie from another node (ram_copies or disc_copies). The cookie replacement (during merge of the schema table definition) is performed each time a RAM node connects to another node.
Transactions which update the definition of a table, requires that Mnesia is started on all nodes where the storage type of the schema is disc_copies. All replicas of the table on these nodes must also be loaded. There are a few exceptions to these availability rules. Tables may be created and new replicas may be added without starting all of the disc-full nodes. New replicas may be added before all other replicas of the table have been loaded, it will suffice when one other replica is active.
System events and table events are the two categories of events that Mnesia will generate in various situations.
It is possible for user process to subscribe on the events generated by Mnesia. We have the following two functions:
Ensures that a copy of all events of type
All system events are subscribed by Mnesia's
gen_event handler. The default gen_event handler is
The system events are detailed below:
Mnesia has been started on a node. Node is the name of the node. By default this event is ignored.
Mnesia has been stopped on a node. Node is the name of the node. By default this event is ignored.
a checkpoint with the name
A checkpoint with the name
Mnesia on the current node is overloaded and the subscriber should take action.
A typical overload situation occurs when the
applications are performing more updates on disc
resident tables than Mnesia is able to handle. Ignoring
this kind of overload may lead into a situation where
the disc space is exhausted (regardless of the size of
the tables stored on disc).
Each update is appended to
the transaction log and occasionally(depending of how it
is configured) dumped to the tables files. The
table file storage is more compact than the transaction
log storage, especially if the same record is updated
over and over again. If the thresholds for dumping the
transaction log have been reached before the previous
dump was finished an overload event is triggered.
Another typical overload situation is when the transaction manager cannot commit transactions at the same pace as the applications are performing updates of disc resident tables. When this happens the message queue of the transaction manager will continue to grow until the memory is exhausted or the load decreases.
The same problem may occur for dirty updates. The overload is detected locally on the current node, but its cause may be on another node. Application processes may cause heavy loads if any table are residing on other nodes (replicated or not). By default this event is reported to the error_logger.
Mnesia regards the database as
potential inconsistent and gives its applications a chance
to recover from the inconsistency, e.g. by installing a
consistent backup as fallback and then restart the system
or pick a
Mnesia has encountered a fatal error
and will (in a short period of time) be terminated. The reason for
the fatal error is explained in Format and Args which may
be given as input to
Mnesia has detected something that
may be of interest when debugging the system. This is explained
in
Mnesia has encountered an error. The
reason for the error is explained i
An application has invoked the
function
Another category of events are table events, which are events related to table updates. There are two types of table events simple and detailed.
The simple table events are tuples looking like this:
a new record has been written. NewRecord contains the new value of the record.
a record has possibly been deleted
with
one or more records possibly has
been deleted. All records with the key Key in the table
The detailed table events are tuples looking like
this:
a new record has been written. NewRecord contains the new value of the record and OldRecords contains the records before the operation is performed. Note that the new content is dependent on the type of the table.
records has possibly been deleted
Debugging a Mnesia application can be difficult due to a number of reasons, primarily related to difficulties in understanding how the transaction and table load mechanisms work. An other source of confusion may be the semantics of nested transactions.
We may set the debug level of Mnesia by calling:
Where the parameter
no trace outputs at all. This is the default.
activates tracing of important debug events. These
debug events will generate
activates all events at the verbose level plus
traces of all debug events. These debug events will generate
activates all events at the debug level. On this debug level Mnesia's event handler starts subscribing updates on all Mnesia tables. This level is only intended for debugging small toy systems, since many large events may be generated.
is an alias for none.
is an alias for debug.
The debug level of Mnesia itself, is also an application parameter, thereby making it possible to start an Erlang system in order to turn on Mnesia debug in the initial start-up phase by using the following code:
% erl -mnesia debug verbose
Programming concurrent Erlang systems is the subject of a separate book. However, it is worthwhile to draw attention to the following features, which permit concurrent processes to exist in a Mnesia system.
A group of functions or processes can be called within a transaction. A transaction may include statements that read, write or delete data from the DBMS. A large number of such transactions can run concurrently, and the programmer does not have to explicitly synchronize the processes which manipulate the data. All programs accessing the database through the transaction system may be written as if they had sole access to the data. This is a very desirable property since all synchronization is taken care of by the transaction handler. If a program reads or writes data, the system ensures that no other program tries to manipulate the same data at the same time.
It is possible to move tables, delete tables or reconfigure
the layout of a table in various ways. An important aspect of
the actual implementation of these functions is that it is
possible for user programs to continue to use a table while it
is being reconfigured. For example, it is possible to
simultaneously move a table and perform write operations to the
table . This is important for many applications that
require continuously available services. Refer to Chapter 4:
If and when we decide that we would like to start and manipulate Mnesia, it is often easier to write the definitions and data into an ordinary text file. Initially, no tables and no data exist, or which tables are required. At the initial stages of prototyping it is prudent write all data into one file, process that file and have the data in the file inserted into the database. It is possible to initialize Mnesia with data read from a text file. We have the following two functions to work with text files.
These functions are of course much slower than the ordinary store and load functions of Mnesia. However, this is mainly intended for minor experiments and initial prototyping. The major advantages of these functions is that they are very easy to use.
The format of the text file is:
{tables, [{Typename, [Options]}, {Typename2 ......}]}. {Typename, Attribute1, Atrribute2 ....}. {Typename, Attribute1, Atrribute2 ....}.
For example, if we want to start playing with a small
database for healthy foods, we enter then following data into
the file
The following session with the Erlang shell then shows how to load the fruits database.
mnesia:load_textfile("FRUITS"). New table fruit New table vegetable {atomic,ok} 2> mnesia:info(). ---> Processes holding locks <--- ---> Processes waiting for locks <--- ---> Pending (remote) transactions <--- ---> Active (local) transactions <--- ---> Uncertain transactions <--- ---> Active tables <--- vegetable : with 2 records occuping 299 words of mem fruit : with 2 records occuping 291 words of mem schema : with 3 records occuping 401 words of mem ===> System info in version "1.1", debug level = none <=== opt_disc. Directory "/var/tmp/Mnesia.nonode@nohost" is used. use fallback at restart = false running db nodes = [nonode@nohost] stopped db nodes = [] remote = [] ram_copies = [fruit,vegetable] disc_copies = [schema] disc_only_copies = [] [{nonode@nohost,disc_copies}] = [schema] [{nonode@nohost,ram_copies}] = [fruit,vegetable] 3 transactions committed, 0 aborted, 0 restarted, 2 logged to disc 0 held locks, 0 in queue; 0 local transactions, 0 remote 0 transactions waits for other nodes: [] ok 3> ]]>
Where we can see that the DBMS was initiated from a regular text file.
The Company database introduced in Chapter 2 has three tables which store records (employee, dept, project), and three tables which store relationships (manager, at_dep, in_proj). This is a normalized data model, which has some advantages over a non-normalized data model.
It is more efficient to do a generalized search in a normalized database. Some operations are also easier to perform on a normalized data model. For example, we can easily remove one project, as the following example illustrates:
In reality, data models are seldom fully normalized. A realistic alternative to a normalized database model would be a data model which is not even in first normal form. Mnesia is very suitable for applications such as telecommunications, because it is easy to organize data in a very flexible manner. A Mnesia database is always organized as a set of tables. Each table is filled with rows/objects/records. What sets Mnesia apart is that individual fields in a record can contain any type of compound data structures. An individual field in a record can contain lists, tuples, functions, and even record code.
Many telecommunications applications have unique requirements on lookup times for certain types of records. If our Company database had been a part of a telecommunications system, then it could be that the lookup time of an employee together with a list of the projects the employee is working on, should be minimized. If this was the case, we might choose a drastically different data model which has no direct relationships. We would only have the records themselves, and different records could contain either direct references to other records, or they could contain other records which are not part of the Mnesia schema.
We could create the following record definitions:
An record which describes an employee might look like this:
Me = #employee{emp_no= 104732, name = klacke, salary = 7, sex = male, phone = 99586, room_no = {221, 015}, dept = 'B/SFR', projects = [erlang, mnesia, otp], manager = 114872},
This model only has three different tables, and the employee records contain references to other records. We have the following references in the record.
We could also use the Mnesia record identifiers (
With this data model, some operations execute considerably faster than they do with the normalized data model in our Company database. On the other hand, some other operations become much more complicated. In particular, it becomes more difficult to ensure that records do not contain dangling pointers to other non-existent, or deleted, records.
The following code exemplifies a search with a non-normalized
data model. To find all employees at department
This code is not only easier to write and to understand, but it also executes much faster.
It is easy to show examples of code which executes faster if
we use a non-normalized data model, instead of a normalized
model. The main reason for this is that fewer tables are
required. For this reason, we can more easily combine data from
different tables in join operations. In the above example, the