diff options
Diffstat (limited to 'system/doc/efficiency_guide/tablesDatabases.xml')
-rw-r--r-- | system/doc/efficiency_guide/tablesDatabases.xml | 379 |
1 files changed, 379 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/tablesDatabases.xml b/system/doc/efficiency_guide/tablesDatabases.xml new file mode 100644 index 0000000000..4b53348c4c --- /dev/null +++ b/system/doc/efficiency_guide/tablesDatabases.xml @@ -0,0 +1,379 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2001</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>Tables and databases</title> + <prepared>Ingela Anderton</prepared> + <docno></docno> + <date>2001-08-07</date> + <rev></rev> + <file>tablesDatabases.xml</file> + </header> + + <section> + <title>Ets, Dets and Mnesia</title> + <p>Every example using Ets has a corresponding example in + Mnesia. In general all Ets examples also apply to Dets tables.</p> + + <section> + <title>Select/Match operations</title> + <p>Select/Match operations on Ets and Mnesia tables can become + very expensive operations. They usually need to scan the complete + table. You should try to structure your + data so that you minimize the need for select/match + operations. However, if you really need a select/match operation, + it will still be more efficient than using <c>tab2list</c>. + Examples of this and also of ways to avoid select/match will be provided in + some of the following sections. The functions + <c>ets:select/2</c> and <c>mnesia:select/3</c> should be preferred over + <c>ets:match/2</c>,<c>ets:match_object/2</c>, and <c>mnesia:match_object/3</c>.</p> + <note> + <p>There are exceptions when the complete table is not + scanned, for instance if part of the key is bound when searching an + <c>ordered_set</c> table, or if it is a Mnesia + table and there is a secondary index on the field that is + selected/matched. If the key is fully bound there will, of course, be + no point in doing a select/match, unless you have a bag table and + you are only interested in a sub-set of the elements with + the specific key.</p> + </note> + <p>When creating a record to be used in a select/match operation you + want most of the fields to have the value '_'. The easiest and fastest way + to do that is as follows:</p> + <pre> +#person{age = 42, _ = '_'}. </pre> + </section> + + <section> + <title>Deleting an element</title> + <p>The delete operation is considered + successful if the element was not present in the table. Hence + all attempts to check that the element is present in the + Ets/Mnesia table before deletion are unnecessary. Here follows + an example for Ets tables.</p> + <p><em>DO</em></p> + <pre> +... +ets:delete(Tab, Key), +...</pre> + <p><em>DO NOT</em></p> + <pre> +... +case ets:lookup(Tab, Key) of + [] -> + ok; + [_|_] -> + ets:delete(Tab, Key) +end, +...</pre> + </section> + + <section> + <title>Data fetching</title> + <p>Do not fetch data that you already have! Consider that you + have a module that handles the abstract data type Person. You + export the interface function <c>print_person/1</c> that uses the internal functions + <c>print_name/1</c>, <c>print_age/1</c>, <c>print_occupation/1</c>.</p> + <note> + <p>If the functions <c>print_name/1</c> and so on, had been interface + functions the matter comes in to a whole new light, as you + do not want the user of the interface to know about the + internal data representation. </p> + </note> + <p><em>DO</em></p> + <code type="erl"> +%%% Interface function +print_person(PersonId) -> + %% Look up the person in the named table person, + case ets:lookup(person, PersonId) of + [Person] -> + print_name(Person), + print_age(Person), + print_occupation(Person); + [] -> + io:format("No person with ID = ~p~n", [PersonID]) + end. + +%%% Internal functions +print_name(Person) -> + io:format("No person ~p~n", [Person#person.name]). + +print_age(Person) -> + io:format("No person ~p~n", [Person#person.age]). + +print_occupation(Person) -> + io:format("No person ~p~n", [Person#person.occupation]).</code> + <p><em>DO NOT</em></p> + <code type="erl"> +%%% Interface function +print_person(PersonId) -> + %% Look up the person in the named table person, + case ets:lookup(person, PersonId) of + [Person] -> + print_name(PersonID), + print_age(PersonID), + print_occupation(PersonID); + [] -> + io:format("No person with ID = ~p~n", [PersonID]) + end. + +%%% Internal functionss +print_name(PersonID) -> + [Person] = ets:lookup(person, PersonId), + io:format("No person ~p~n", [Person#person.name]). + +print_age(PersonID) -> + [Person] = ets:lookup(person, PersonId), + io:format("No person ~p~n", [Person#person.age]). + +print_occupation(PersonID) -> + [Person] = ets:lookup(person, PersonId), + io:format("No person ~p~n", [Person#person.occupation]).</code> + </section> + + <section> + <title>Non-persistent data storage </title> + <p>For non-persistent database storage, prefer Ets tables over + Mnesia local_content tables. Even the Mnesia <c>dirty_write</c> + operations carry a fixed overhead compared to Ets writes. + Mnesia must check if the table is replicated or has indices, + this involves at least one Ets lookup for each + <c>dirty_write</c>. Thus, Ets writes will always be faster than + Mnesia writes.</p> + </section> + + <section> + <title>tab2list</title> + <p>Assume we have an Ets-table, which uses <c>idno</c> as key, + and contains:</p> + <pre> +[#person{idno = 1, name = "Adam", age = 31, occupation = "mailman"}, + #person{idno = 2, name = "Bryan", age = 31, occupation = "cashier"}, + #person{idno = 3, name = "Bryan", age = 35, occupation = "banker"}, + #person{idno = 4, name = "Carl", age = 25, occupation = "mailman"}]</pre> + <p>If we <em>must</em> return all data stored in the Ets-table we + can use <c>ets:tab2list/1</c>. However, usually we are only + interested in a subset of the information in which case + <c>ets:tab2list/1</c> is expensive. If we only want to extract + one field from each record, e.g., the age of every person, we + should use:</p> + <p><em>DO</em></p> + <pre> +... +ets:select(Tab,[{ #person{idno='_', + name='_', + age='$1', + occupation = '_'}, + [], + ['$1']}]), +...</pre> + <p><em>DO NOT</em></p> + <pre> +... +TabList = ets:tab2list(Tab), +lists:map(fun(X) -> X#person.age end, TabList), +...</pre> + <p>If we are only interested in the age of all persons named + Bryan, we should:</p> + <p><em>DO</em></p> + <pre> +... +ets:select(Tab,[{ #person{idno='_', + name="Bryan", + age='$1', + occupation = '_'}, + [], + ['$1']}]), +...</pre> + <p><em>DO NOT</em></p> + <pre> +... +TabList = ets:tab2list(Tab), +lists:foldl(fun(X, Acc) -> case X#person.name of + "Bryan" -> + [X#person.age|Acc]; + _ -> + Acc + end + end, [], TabList), +...</pre> + <p><em>REALLY DO NOT</em></p> + <pre> +... +TabList = ets:tab2list(Tab), +BryanList = lists:filter(fun(X) -> X#person.name == "Bryan" end, + TabList), +lists:map(fun(X) -> X#person.age end, BryanList), +...</pre> + <p>If we need all information stored in the Ets table about + persons named Bryan we should:</p> + <p><em>DO</em></p> + <pre> +... +ets:select(Tab, [{#person{idno='_', + name="Bryan", + age='_', + occupation = '_'}, [], ['$_']}]), +...</pre> + <p><em>DO NOT</em></p> + <pre> +... +TabList = ets:tab2list(Tab), +lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList), +...</pre> + </section> + + <section> + <title>Ordered_set tables</title> + <p>If the data in the table should be accessed so that the order + of the keys in the table is significant, the table type + <c>ordered_set</c> could be used instead of the more usual + <c>set</c> table type. An <c>ordered_set</c> is always + traversed in Erlang term order with regard to the key field + so that return values from functions such as <c>select</c>, + <c>match_object</c>, and <c>foldl</c> are ordered by the key + values. Traversing an <c>ordered_set</c> with the <c>first</c> and + <c>next</c> operations also returns the keys ordered.</p> + <note> + <p>An <c>ordered_set</c> only guarantees that + objects are processed in <em>key</em> order. Results from functions as + <c>ets:select/2</c> appear in the <em>key</em> order even if + the key is not included in the result.</p> + </note> + </section> + </section> + + <section> + <title>Ets specific</title> + + <section> + <title>Utilizing the keys of the Ets table</title> + <p>An Ets table is a single key table (either a hash table or a + tree ordered by the key) and should be used as one. In other + words, use the key to look up things whenever possible. A + lookup by a known key in a set Ets table is constant and for a + ordered_set Ets table it is O(logN). A key lookup is always + preferable to a call where the whole table has to be + scanned. In the examples above, the field <c>idno</c> is the + key of the table and all lookups where only the name is known + will result in a complete scan of the (possibly large) table + for a matching result.</p> + <p>A simple solution would be to use the <c>name</c> field as + the key instead of the <c>idno</c> field, but that would cause + problems if the names were not unique. A more general solution + would be create a second table with name as key and idno as + data, i.e. to index (invert) the table with regards to the + <c>name</c> field. The second table would of course have to be + kept consistent with the master table. Mnesia could do this + for you, but a home brew index table could be very efficient + compared to the overhead involved in using Mnesia.</p> + <p>An index table for the table in the previous examples would + have to be a bag (as keys would appear more than once) and could + have the following contents:</p> + <pre> + +[#index_entry{name="Adam", idno=1}, + #index_entry{name="Bryan", idno=2}, + #index_entry{name="Bryan", idno=3}, + #index_entry{name="Carl", idno=4}]</pre> + <p>Given this index table a lookup of the <c>age</c> fields for + all persons named "Bryan" could be done like this:</p> + <pre> +... +MatchingIDs = ets:lookup(IndexTable,"Bryan"), +lists:map(fun(#index_entry{idno = ID}) -> + [#person{age = Age}] = ets:lookup(PersonTable, ID), + Age + end, + MatchingIDs), +...</pre> + <p>Note that the code above never uses <c>ets:match/2</c> but + instead utilizes the <c>ets:lookup/2</c> call. The + <c>lists:map/2</c> call is only used to traverse the <c>idno</c>s + matching the name "Bryan" in the table; therefore the number of lookups + in the master table is minimized.</p> + <p>Keeping an index table introduces some overhead when + inserting records in the table, therefore the number of operations + gained from the table has to be weighted against the number of + operations inserting objects in the table. However, note that the gain when + the key can be used to lookup elements is significant.</p> + </section> + </section> + + <section> + <title>Mnesia specific</title> + + <section> + <title>Secondary index</title> + <p>If you frequently do a lookup on a field that is not the + key of the table, you will lose performance using + "mnesia:select/match_object" as this function will traverse the + whole table. You may create a secondary index instead and + use "mnesia:index_read" to get faster access, however this + will require more memory. Example:</p> + <pre> +-record(person, {idno, name, age, occupation}). + ... +{atomic, ok} = +mnesia:create_table(person, [{index,[#person.age]}, + {attributes, + record_info(fields, person)}]), +{atomic, ok} = mnesia:add_table_index(person, age), +... + +PersonsAge42 = + mnesia:dirty_index_read(person, 42, #person.age), +...</pre> + </section> + + <section> + <title>Transactions </title> + <p>Transactions is a way to guarantee that the distributed + Mnesia database remains consistent, even when many different + processes update it in parallel. However if you have + real time requirements it is recommended to use dirty + operations instead of transactions. When using the dirty + operations you lose the consistency guarantee, this is usually + solved by only letting one process update the table. Other + processes have to send update requests to that process.</p> + <pre> +... +% Using transaction + +Fun = fun() -> + [mnesia:read({Table, Key}), + mnesia:read({Table2, Key2})] + end, + +{atomic, [Result1, Result2]} = mnesia:transaction(Fun), +... + +% Same thing using dirty operations +... + +Result1 = mnesia:dirty_read({Table, Key}), +Result2 = mnesia:dirty_read({Table2, Key2}), +...</pre> + </section> + </section> +</chapter> + |