aboutsummaryrefslogtreecommitdiffstats
path: root/system/doc/efficiency_guide/tablesDatabases.xml
diff options
context:
space:
mode:
Diffstat (limited to 'system/doc/efficiency_guide/tablesDatabases.xml')
-rw-r--r--system/doc/efficiency_guide/tablesDatabases.xml379
1 files changed, 379 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/tablesDatabases.xml b/system/doc/efficiency_guide/tablesDatabases.xml
new file mode 100644
index 0000000000..4b53348c4c
--- /dev/null
+++ b/system/doc/efficiency_guide/tablesDatabases.xml
@@ -0,0 +1,379 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+ <header>
+ <copyright>
+ <year>2001</year><year>2009</year>
+ <holder>Ericsson AB. All Rights Reserved.</holder>
+ </copyright>
+ <legalnotice>
+ The contents of this file are subject to the Erlang Public License,
+ Version 1.1, (the "License"); you may not use this file except in
+ compliance with the License. You should have received a copy of the
+ Erlang Public License along with this software. If not, it can be
+ retrieved online at http://www.erlang.org/.
+
+ Software distributed under the License is distributed on an "AS IS"
+ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+ the License for the specific language governing rights and limitations
+ under the License.
+
+ </legalnotice>
+
+ <title>Tables and databases</title>
+ <prepared>Ingela Anderton</prepared>
+ <docno></docno>
+ <date>2001-08-07</date>
+ <rev></rev>
+ <file>tablesDatabases.xml</file>
+ </header>
+
+ <section>
+ <title>Ets, Dets and Mnesia</title>
+ <p>Every example using Ets has a corresponding example in
+ Mnesia. In general all Ets examples also apply to Dets tables.</p>
+
+ <section>
+ <title>Select/Match operations</title>
+ <p>Select/Match operations on Ets and Mnesia tables can become
+ very expensive operations. They usually need to scan the complete
+ table. You should try to structure your
+ data so that you minimize the need for select/match
+ operations. However, if you really need a select/match operation,
+ it will still be more efficient than using <c>tab2list</c>.
+ Examples of this and also of ways to avoid select/match will be provided in
+ some of the following sections. The functions
+ <c>ets:select/2</c> and <c>mnesia:select/3</c> should be preferred over
+ <c>ets:match/2</c>,<c>ets:match_object/2</c>, and <c>mnesia:match_object/3</c>.</p>
+ <note>
+ <p>There are exceptions when the complete table is not
+ scanned, for instance if part of the key is bound when searching an
+ <c>ordered_set</c> table, or if it is a Mnesia
+ table and there is a secondary index on the field that is
+ selected/matched. If the key is fully bound there will, of course, be
+ no point in doing a select/match, unless you have a bag table and
+ you are only interested in a sub-set of the elements with
+ the specific key.</p>
+ </note>
+ <p>When creating a record to be used in a select/match operation you
+ want most of the fields to have the value '_'. The easiest and fastest way
+ to do that is as follows:</p>
+ <pre>
+#person{age = 42, _ = '_'}. </pre>
+ </section>
+
+ <section>
+ <title>Deleting an element</title>
+ <p>The delete operation is considered
+ successful if the element was not present in the table. Hence
+ all attempts to check that the element is present in the
+ Ets/Mnesia table before deletion are unnecessary. Here follows
+ an example for Ets tables.</p>
+ <p><em>DO</em></p>
+ <pre>
+...
+ets:delete(Tab, Key),
+...</pre>
+ <p><em>DO NOT</em></p>
+ <pre>
+...
+case ets:lookup(Tab, Key) of
+ [] ->
+ ok;
+ [_|_] ->
+ ets:delete(Tab, Key)
+end,
+...</pre>
+ </section>
+
+ <section>
+ <title>Data fetching</title>
+ <p>Do not fetch data that you already have! Consider that you
+ have a module that handles the abstract data type Person. You
+ export the interface function <c>print_person/1</c> that uses the internal functions
+ <c>print_name/1</c>, <c>print_age/1</c>, <c>print_occupation/1</c>.</p>
+ <note>
+ <p>If the functions <c>print_name/1</c> and so on, had been interface
+ functions the matter comes in to a whole new light, as you
+ do not want the user of the interface to know about the
+ internal data representation. </p>
+ </note>
+ <p><em>DO</em></p>
+ <code type="erl">
+%%% Interface function
+print_person(PersonId) ->
+ %% Look up the person in the named table person,
+ case ets:lookup(person, PersonId) of
+ [Person] ->
+ print_name(Person),
+ print_age(Person),
+ print_occupation(Person);
+ [] ->
+ io:format("No person with ID = ~p~n", [PersonID])
+ end.
+
+%%% Internal functions
+print_name(Person) ->
+ io:format("No person ~p~n", [Person#person.name]).
+
+print_age(Person) ->
+ io:format("No person ~p~n", [Person#person.age]).
+
+print_occupation(Person) ->
+ io:format("No person ~p~n", [Person#person.occupation]).</code>
+ <p><em>DO NOT</em></p>
+ <code type="erl">
+%%% Interface function
+print_person(PersonId) ->
+ %% Look up the person in the named table person,
+ case ets:lookup(person, PersonId) of
+ [Person] ->
+ print_name(PersonID),
+ print_age(PersonID),
+ print_occupation(PersonID);
+ [] ->
+ io:format("No person with ID = ~p~n", [PersonID])
+ end.
+
+%%% Internal functionss
+print_name(PersonID) ->
+ [Person] = ets:lookup(person, PersonId),
+ io:format("No person ~p~n", [Person#person.name]).
+
+print_age(PersonID) ->
+ [Person] = ets:lookup(person, PersonId),
+ io:format("No person ~p~n", [Person#person.age]).
+
+print_occupation(PersonID) ->
+ [Person] = ets:lookup(person, PersonId),
+ io:format("No person ~p~n", [Person#person.occupation]).</code>
+ </section>
+
+ <section>
+ <title>Non-persistent data storage </title>
+ <p>For non-persistent database storage, prefer Ets tables over
+ Mnesia local_content tables. Even the Mnesia <c>dirty_write</c>
+ operations carry a fixed overhead compared to Ets writes.
+ Mnesia must check if the table is replicated or has indices,
+ this involves at least one Ets lookup for each
+ <c>dirty_write</c>. Thus, Ets writes will always be faster than
+ Mnesia writes.</p>
+ </section>
+
+ <section>
+ <title>tab2list</title>
+ <p>Assume we have an Ets-table, which uses <c>idno</c> as key,
+ and contains:</p>
+ <pre>
+[#person{idno = 1, name = "Adam", age = 31, occupation = "mailman"},
+ #person{idno = 2, name = "Bryan", age = 31, occupation = "cashier"},
+ #person{idno = 3, name = "Bryan", age = 35, occupation = "banker"},
+ #person{idno = 4, name = "Carl", age = 25, occupation = "mailman"}]</pre>
+ <p>If we <em>must</em> return all data stored in the Ets-table we
+ can use <c>ets:tab2list/1</c>. However, usually we are only
+ interested in a subset of the information in which case
+ <c>ets:tab2list/1</c> is expensive. If we only want to extract
+ one field from each record, e.g., the age of every person, we
+ should use:</p>
+ <p><em>DO</em></p>
+ <pre>
+...
+ets:select(Tab,[{ #person{idno='_',
+ name='_',
+ age='$1',
+ occupation = '_'},
+ [],
+ ['$1']}]),
+...</pre>
+ <p><em>DO NOT</em></p>
+ <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:map(fun(X) -> X#person.age end, TabList),
+...</pre>
+ <p>If we are only interested in the age of all persons named
+ Bryan, we should:</p>
+ <p><em>DO</em></p>
+ <pre>
+...
+ets:select(Tab,[{ #person{idno='_',
+ name="Bryan",
+ age='$1',
+ occupation = '_'},
+ [],
+ ['$1']}]),
+...</pre>
+ <p><em>DO NOT</em></p>
+ <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:foldl(fun(X, Acc) -> case X#person.name of
+ "Bryan" ->
+ [X#person.age|Acc];
+ _ ->
+ Acc
+ end
+ end, [], TabList),
+...</pre>
+ <p><em>REALLY DO NOT</em></p>
+ <pre>
+...
+TabList = ets:tab2list(Tab),
+BryanList = lists:filter(fun(X) -> X#person.name == "Bryan" end,
+ TabList),
+lists:map(fun(X) -> X#person.age end, BryanList),
+...</pre>
+ <p>If we need all information stored in the Ets table about
+ persons named Bryan we should:</p>
+ <p><em>DO</em></p>
+ <pre>
+...
+ets:select(Tab, [{#person{idno='_',
+ name="Bryan",
+ age='_',
+ occupation = '_'}, [], ['$_']}]),
+...</pre>
+ <p><em>DO NOT</em></p>
+ <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList),
+...</pre>
+ </section>
+
+ <section>
+ <title>Ordered_set tables</title>
+ <p>If the data in the table should be accessed so that the order
+ of the keys in the table is significant, the table type
+ <c>ordered_set</c> could be used instead of the more usual
+ <c>set</c> table type. An <c>ordered_set</c> is always
+ traversed in Erlang term order with regard to the key field
+ so that return values from functions such as <c>select</c>,
+ <c>match_object</c>, and <c>foldl</c> are ordered by the key
+ values. Traversing an <c>ordered_set</c> with the <c>first</c> and
+ <c>next</c> operations also returns the keys ordered.</p>
+ <note>
+ <p>An <c>ordered_set</c> only guarantees that
+ objects are processed in <em>key</em> order. Results from functions as
+ <c>ets:select/2</c> appear in the <em>key</em> order even if
+ the key is not included in the result.</p>
+ </note>
+ </section>
+ </section>
+
+ <section>
+ <title>Ets specific</title>
+
+ <section>
+ <title>Utilizing the keys of the Ets table</title>
+ <p>An Ets table is a single key table (either a hash table or a
+ tree ordered by the key) and should be used as one. In other
+ words, use the key to look up things whenever possible. A
+ lookup by a known key in a set Ets table is constant and for a
+ ordered_set Ets table it is O(logN). A key lookup is always
+ preferable to a call where the whole table has to be
+ scanned. In the examples above, the field <c>idno</c> is the
+ key of the table and all lookups where only the name is known
+ will result in a complete scan of the (possibly large) table
+ for a matching result.</p>
+ <p>A simple solution would be to use the <c>name</c> field as
+ the key instead of the <c>idno</c> field, but that would cause
+ problems if the names were not unique. A more general solution
+ would be create a second table with name as key and idno as
+ data, i.e. to index (invert) the table with regards to the
+ <c>name</c> field. The second table would of course have to be
+ kept consistent with the master table. Mnesia could do this
+ for you, but a home brew index table could be very efficient
+ compared to the overhead involved in using Mnesia.</p>
+ <p>An index table for the table in the previous examples would
+ have to be a bag (as keys would appear more than once) and could
+ have the following contents:</p>
+ <pre>
+
+[#index_entry{name="Adam", idno=1},
+ #index_entry{name="Bryan", idno=2},
+ #index_entry{name="Bryan", idno=3},
+ #index_entry{name="Carl", idno=4}]</pre>
+ <p>Given this index table a lookup of the <c>age</c> fields for
+ all persons named "Bryan" could be done like this:</p>
+ <pre>
+...
+MatchingIDs = ets:lookup(IndexTable,"Bryan"),
+lists:map(fun(#index_entry{idno = ID}) ->
+ [#person{age = Age}] = ets:lookup(PersonTable, ID),
+ Age
+ end,
+ MatchingIDs),
+...</pre>
+ <p>Note that the code above never uses <c>ets:match/2</c> but
+ instead utilizes the <c>ets:lookup/2</c> call. The
+ <c>lists:map/2</c> call is only used to traverse the <c>idno</c>s
+ matching the name "Bryan" in the table; therefore the number of lookups
+ in the master table is minimized.</p>
+ <p>Keeping an index table introduces some overhead when
+ inserting records in the table, therefore the number of operations
+ gained from the table has to be weighted against the number of
+ operations inserting objects in the table. However, note that the gain when
+ the key can be used to lookup elements is significant.</p>
+ </section>
+ </section>
+
+ <section>
+ <title>Mnesia specific</title>
+
+ <section>
+ <title>Secondary index</title>
+ <p>If you frequently do a lookup on a field that is not the
+ key of the table, you will lose performance using
+ "mnesia:select/match_object" as this function will traverse the
+ whole table. You may create a secondary index instead and
+ use "mnesia:index_read" to get faster access, however this
+ will require more memory. Example:</p>
+ <pre>
+-record(person, {idno, name, age, occupation}).
+ ...
+{atomic, ok} =
+mnesia:create_table(person, [{index,[#person.age]},
+ {attributes,
+ record_info(fields, person)}]),
+{atomic, ok} = mnesia:add_table_index(person, age),
+...
+
+PersonsAge42 =
+ mnesia:dirty_index_read(person, 42, #person.age),
+...</pre>
+ </section>
+
+ <section>
+ <title>Transactions </title>
+ <p>Transactions is a way to guarantee that the distributed
+ Mnesia database remains consistent, even when many different
+ processes update it in parallel. However if you have
+ real time requirements it is recommended to use dirty
+ operations instead of transactions. When using the dirty
+ operations you lose the consistency guarantee, this is usually
+ solved by only letting one process update the table. Other
+ processes have to send update requests to that process.</p>
+ <pre>
+...
+% Using transaction
+
+Fun = fun() ->
+ [mnesia:read({Table, Key}),
+ mnesia:read({Table2, Key2})]
+ end,
+
+{atomic, [Result1, Result2]} = mnesia:transaction(Fun),
+...
+
+% Same thing using dirty operations
+...
+
+Result1 = mnesia:dirty_read({Table, Key}),
+Result2 = mnesia:dirty_read({Table2, Key2}),
+...</pre>
+ </section>
+ </section>
+</chapter>
+