19972009Ericsson AB. All Rights Reserved.
The contents of this file are subject to the Erlang Public License,
Version 1.1, (the "License"); you may not use this file except in
compliance with the License. You should have received a copy of the
Erlang Public License along with this software. If not, it can be
retrieved online at http://www.erlang.org/.
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
the License for the specific language governing rights and limitations
under the License.
Building A Mnesia DatabaseMnesia_chap3.xml
This chapter details the basic steps involved when designing
a Mnesia database and the programming constructs which make different
solutions available to the programmer. The chapter includes the following
sections:
defining a schemathe datamodelstarting Mnesiacreating new tables.Defining a Schema
The configuration of a Mnesia system is described in the
schema. The schema is a special table which contains information
such as the table names and each table's
storage type, (i.e. whether a table should be stored in RAM,
on disc or possibly on both, as well as its location).
Unlike data tables, information contained in schema tables can only be
accessed and modified by using the schema related functions
described in this section.
Mnesia has various functions for defining the
database schema. It is possible to move tables, delete tables,
or reconfigure the layout of tables.
An important aspect of these functions is that the system can access a
table while it is being reconfigured. For example, it is possible to move a
table and simultaneously perform write operations to the same
table. This feature is essential for applications that require
continuous service.
The following section describes the functions available for schema management,
all of which return a tuple:
{atomic, ok}; or,
{aborted, Reason} if unsuccessful.Schema Functionsmnesia:create_schema(NodeList). This function is
used to initialize a new, empty schema. This is a mandatory
requirement before Mnesia can be started. Mnesia is a truly
distributed DBMS and the schema is a system table that is
replicated on all nodes in a Mnesia system.
The function will fail if a schema is already present on any of
the nodes in NodeList. This function requires Mnesia
to be stopped on the all
db_nodes contained in the parameter NodeList.
Applications call this function only once,
since it is usually a one-time activity to initialize a new
database.
mnesia:delete_schema(DiscNodeList). This function
erases any old schemas on the nodes in
DiscNodeList. It also removes all old tables together
with all data. This function requires Mnesia to be stopped
on all db_nodes.
mnesia:delete_table(Tab). This function
permanently deletes all replicas of table Tab.
mnesia:clear_table(Tab). This function
permanently deletes all entries in table Tab.
mnesia:move_table_copy(Tab, From, To). This
function moves the copy of table Tab from node
From to node To. The table storage type,
{type} is preserved, so if a RAM table is moved from
one node to another node, it remains a RAM table on the new
node. It is still possible for other transactions to perform
read and write operation to the table while it is being
moved.
mnesia:add_table_copy(Tab, Node, Type). This
function creates a replica of the table Tab at node
Node. The Type argument must be either of the
atoms ram_copies, disc_copies, or
disc_only_copies. If we add a copy of the system
table schema to a node, this means that we want the
Mnesia schema to reside there as well. This action then
extends the set of nodes that comprise this particular
Mnesia system.
mnesia:del_table_copy(Tab, Node). This function
deletes the replica of table Tab at node Node.
When the last replica of a table is removed, the table is
deleted.
mnesia:transform_table(Tab, Fun, NewAttributeList, NewRecordName). This
function changes the format on all records in table
Tab. It applies the argument Fun to all
records in the table. Fun shall be a function which
takes a record of the old type, and returns the record of the new
type. The table key may not be changed.
The Fun argument can also be the atom
ignore, it indicates that only the meta data about the table will
be updated. Usage of ignore is not recommended (since it creates
inconsistencies between the meta data and the actual data) but included
as a possibility for the user do to his own (off-line) transform.
change_table_copy_type(Tab, Node, ToType). This
function changes the storage type of a table. For example, a
RAM table is changed to a disc_table at the node specified
as Node.The Data Model
The data model employed by Mnesia is an extended
relational data model. Data is organized as a set of
tables and relations between different data records can
be modeled as additional tables describing the actual
relationships.
Each table contains instances of Erlang records
and records are represented as Erlang tuples.
Object identifiers, also known as oid, are made up of a table name and a key.
For example, if we have an employee record represented by the tuple
{employee, 104732, klacke, 7, male, 98108, {221, 015}}.
This record has an object id, (Oid) which is the tuple
{employee, 104732}.
Thus, each table is made up of records, where the first element
is a record name and the second element of the table is a key
which identifies the particular record in that table. The
combination of the table name and a key, is an arity two tuple
{Tab, Key} called the Oid. See Chapter 4:Record Names Versus Table Names, for more information
regarding the relationship between the record name and the table
name.
What makes the Mnesia data model an extended relational model
is the ability to store arbitrary Erlang terms in the attribute
fields. One attribute value could for example be a whole tree of
oids leading to other terms in other tables. This
type of record is hard to model in traditional relational
DBMSs.
Starting Mnesia
Before we can start Mnesia, we must initialize an empty schema
on all the participating nodes.
The Erlang system must be started.
Nodes with disc database schema must be defined and
implemented with the function create_schema(NodeList).
When running a distributed system, with two or more
participating nodes, then the mnesia:start( ). function
must be executed on each participating node. Typically this would
be part of the boot script in an embedded environment.
In a test environment or an interactive environment,
mnesia:start() can also be used either from the
Erlang shell, or another program.
Initializing a Schema and Starting Mnesia
To use a known example, we illustrate how to run the
Company database described in Chapter 2 on two separate nodes,
which we call a@gin and b@skeppet. Each of these
nodes must have have a Mnesia directory as well as an
initialized schema before Mnesia can be started. There are two
ways to specify the Mnesia directory to be used:
Specify the Mnesia directory by providing an application
parameter either when starting the Erlang shell or in the
application script. Previously the following example was used
to create the directory for our Company database:
%erl -mnesia dir '"/ldisc/scratch/Mnesia.Company"'
If no command line flag is entered, then the Mnesia
directory will be the current working directory on the node
where the Erlang shell is started.
To start our Company database and get it running on the two
specified nodes, we enter the following commands:
On the node called gin:
gin %erl -sname a -mnesia dir '"/ldisc/scratch/Mnesia.company"'
On the node called skeppet:
skeppet %erl -sname b -mnesia dir '"/ldisc/scratch/Mnesia.company"'
The function mnesia:start() is called on both
nodes.
To initialize the database, execute the following
code on one of the two nodes.
As illustrated above, the two directories reside on different nodes, because the
/ldisc/scratch (the "local" disc) exists on the two different
nodes.
By executing these commands we have configured two Erlang
nodes to run the Company database, and therefore, initialize the
database. This is required only once when setting up, the next time the
system is started mnesia:start() is called
on both nodes, to initialize the system from disc.
In a system of Mnesia nodes, every node is aware of the
current location of all tables. In this example, data is
replicated on both nodes and functions which manipulate the
data in our tables can be executed on either of the two nodes.
Code which manipulate Mnesia data behaves identically
regardless of where the data resides.
The function mnesia:stop() stops Mnesia on the node
where the function is executed. Both the start/0 and
the stop/0 functions work on the "local" Mnesia system,
and there are no functions which start or stop a set of nodes.
The Start-Up Procedure
Mnesia is started by calling the following function:
mnesia:start().
This function initiates the DBMS locally.
The choice of configuration will alter the location and load
order of the tables. The alternatives are listed below:
Tables that are stored locally only, are initialized
from the local Mnesia directory.
Replicated tables that reside locally
as well as somewhere else are either initiated from disc or
by copying the entire table from the other node depending on
which of the different replicas is the most recent. Mnesia
determines which of the tables is the most recent.
Tables that reside on remote nodes are available to other nodes as soon
as they are loaded.
Table initialization is asynchronous, the function
call mnesia:start() returns the atom ok and
then starts to initialize the different tables. Depending on
the size of the database, this may take some time, and the
application programmer must wait for the tables that the
application needs before they can be used. This achieved by using
the function:
mnesia:wait_for_tables(TabList, Timeout)
This function suspends the caller until all tables
specified in TabList are properly initiated.
A problem can arise if a replicated table on one node is
initiated, but Mnesia deduces that another (remote)
replica is more recent than the replica existing on
the local node, the initialization procedure will not proceed.
In this situation, a call to to
mnesia:wait_for_tables/2 suspends the caller until the
remote node has initiated the table from its local disc and
the node has copied the table over the network to the local node.
This procedure can be time consuming however, the shortcut function
shown below will load all the tables from disc at a faster rate:
mnesia:force_load_table(Tab). This function forces
tables to be loaded from disc regardless of the network
situation.
Thus, we can assume that if an application
wishes to use tables a and b, then the
application must perform some action similar to the below code before it can utilize the tables.
case mnesia:wait_for_tables([a, b], 20000) of
{timeout, RemainingTabs} ->
panic(RemainingTabs);
ok ->
synced
end.
When tables are forcefully loaded from the local disc,
all operations that were performed on the replicated table
while the local node was down, and the remote replica was
alive, are lost. This can cause the database to become
inconsistent.
If the start-up procedure fails, the
mnesia:start() function returns the cryptic tuple
{error,{shutdown, {mnesia_sup,start,[normal,[]]}}}.
Use command line arguments -boot start_sasl as argument to
the erl script in order to get more information
about the start failure.
Creating New Tables
Mnesia provides one function to create new tables. This
function is: mnesia:create_table(Name, ArgList).
When executing this function, it returns one of the following
responses:
{atomic, ok} if the function executes
successfully
{aborted, Reason} if the function fails.
The function arguments are:
Name is the atomic name of the table. It is
usually the same name as the name of the records that
constitute the table. (See record_name for more
details.)
ArgList is a list of {Key,Value} tuples.
The following arguments are valid:
{type, Type} where Type must be either of the
atoms set, ordered_set or bag.
The default value is
set. Note: currently 'ordered_set'
is not supported for 'disc_only_copies' tables.
A table of type set or ordered_set has either zero or
one record per key. Whereas a table of type bag can
have an arbitrary number of records per key. The key for
each record is always the first attribute of the record.
The following example illustrates the difference between
type set and bag:
This transaction will return the list [{foo,1,3}] if
the foo table is of type set. However, list
[{foo,1,2}, {foo,1,3}] will return if the table is
of type bag. Note the use of bag and
set table types.
Mnesia tables can never contain
duplicates of the same record in the same table. Duplicate
records have attributes with the same contents and key.
{disc_copies, NodeList}, where NodeList is a
list of the nodes where this table will reside on disc.
Write operations to a table replica of type
disc_copies will write data to the disc copy as well
as to the RAM copy of the table.
It is possible to have a
replicated table of type disc_copies on one node, and
the same table stored as a different type on another node.
The default value is []. This arrangement is
desirable if we want the following operational
characteristics are required:
read operations must be very fast and performed in RAM
all write operations must be written to persistent
storage.
A write operation on a disc_copies table
replica will be performed in two steps. First the write
operation is appended to a log file, then the actual
operation is performed in RAM.
{ram_copies, NodeList}, where NodeList is a
list of the nodes where this table is stored in RAM. The
default value for NodeList is [node()]. If the
default value is used to create a new table, it will be
located on the local node only.
Table replicas of type
ram_copies can be dumped to disc with the function
mnesia:dump_tables(TabList).
{disc_only_copies, NodeList}. These table
replicas are stored on disc only and are therefore slower to
access. However, a disc only replica consumes less memory than
a table replica of the other two storage types.
{index, AttributeNameList}, where
AttributeNameList is a list of atoms specifying the
names of the attributes Mnesia shall build and maintain. An
index table will exist for every element in the list. The
first field of a Mnesia record is the key and thus need no
extra index.
The first field of a record is the second element of the
tuple, which is the representation of the record.
{snmp, SnmpStruct}. SnmpStruct is
described in the SNMP User Guide. Basically, if this attribute
is present in ArgList of mnesia:create_table/2,
the table is immediately accessible by means of the Simple
Network Management Protocol (SNMP).
It is easy to design applications which use SNMP to
manipulate and control the system. Mnesia provides a direct
mapping between the logical tables that make up an SNMP
control application and the physical data which make up a
Mnesia table. []
is default.
{local_content, true} When an application needs a
table whose contents should be locally unique on each
node,
local_content tables may be used. The name of the
table is known to all Mnesia nodes, but its contents is
unique for each node. Access to this type of table must be
done locally.
{attributes, AtomList} is a list of the attribute
names for the records that are supposed to populate the
table. The default value is the list [key, val]. The
table must at least have one extra attribute besides the
key. When accessing single attributes in a record, it is not
recommended to hard code the attribute names as atoms. Use
the construct record_info(fields,record_name)
instead. The expression
record_info(fields,record_name) is processed by the
Erlang macro pre-processor and returns a list of the
record's field names. With the record definition
-record(foo, {x,y,z}). the expression
record_info(fields,foo) is expanded to the list
[x,y,z]. Accordingly, it is possible to provide the
attribute names yourself, or to use the record_info/2
notation.
It is recommended that
the record_info/2 notation be used as it is easier to
maintain the program and it will be more robust with regards
to future record changes.
{record_name, Atom} specifies the common name of
all records stored in the table. All records, stored in
the table, must have this name as their first element.
The record_name defaults to the name of the
table. For more information see Chapter 4:Record Names Versus Table Names.
As an example, assume we have the record definition:
-record(funky, {x, y}).
The below call would create a table which is replicated on two
nodes, has an additional index on the y attribute, and is
of type
bag.