The R13B03 release.OTP_R13B03

author: Erlang/OTP <[email protected]> 2009-11-20 14:54:40 +0000
committer: Erlang/OTP <[email protected]> 2009-11-20 14:54:40 +0000
commit: 84adefa331c4159d432d22840663c38f155cd4c1 (patch)
tree: bff9a9c66adda4df2106dfd0e5c053ab182a12bd /system/doc/efficiency_guide/tablesDatabases.xml
download: otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.gz
otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.bz2
otp-84adefa331c4159d432d22840663c38f155cd4c1.zip
1 files changed, 379 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/tablesDatabases.xml b/system/doc/efficiency_guide/tablesDatabases.xml
new file mode 100644
index 0000000000..4b53348c4c
--- /dev/null
+++ b/system/doc/efficiency_guide/tablesDatabases.xml
@@ -0,0 +1,379 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+  <header>
+    <copyright>
+      <year>2001</year><year>2009</year>
+      <holder>Ericsson AB. All Rights Reserved.</holder>
+    </copyright>
+    <legalnotice>
+      The contents of this file are subject to the Erlang Public License,
+      Version 1.1, (the "License"); you may not use this file except in
+      compliance with the License. You should have received a copy of the
+      Erlang Public License along with this software. If not, it can be
+      retrieved online at http://www.erlang.org/.
+    
+      Software distributed under the License is distributed on an "AS IS"
+      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+      the License for the specific language governing rights and limitations
+      under the License.
+    
+    </legalnotice>
+
+    <title>Tables and databases</title>
+    <prepared>Ingela Anderton</prepared>
+    <docno></docno>
+    <date>2001-08-07</date>
+    <rev></rev>
+    <file>tablesDatabases.xml</file>
+  </header>
+
+  <section>
+    <title>Ets, Dets and Mnesia</title>
+    <p>Every example using Ets has a corresponding example in
+      Mnesia. In general all Ets examples also apply to Dets tables.</p>
+
+    <section>
+      <title>Select/Match operations</title>
+      <p>Select/Match operations on Ets and Mnesia tables can become
+        very expensive operations. They usually need to scan the complete
+        table. You should try to structure your
+        data so that you minimize the need for select/match
+        operations. However, if you really need a select/match operation,
+	it will still be more efficient than using <c>tab2list</c>.
+	Examples of this and also of ways to avoid select/match will be provided in
+        some of the following sections. The functions
+        <c>ets:select/2</c> and <c>mnesia:select/3</c> should be preferred over
+        <c>ets:match/2</c>,<c>ets:match_object/2</c>, and <c>mnesia:match_object/3</c>.</p>
+      <note>
+        <p>There are exceptions when the complete table is not
+          scanned, for instance if part of the key is bound when searching an
+	  <c>ordered_set</c> table, or if it is a Mnesia
+          table and there is a secondary index on the field that is
+          selected/matched. If the key is fully bound there will, of course, be
+          no point in doing a select/match, unless you have a bag table and
+          you are only interested in a sub-set of the elements with
+          the specific key.</p>
+      </note>
+      <p>When creating a record to be used in a select/match operation you
+        want most of the fields to have the value '_'. The easiest and fastest way
+	to do that is as follows:</p>
+      <pre>
+#person{age = 42, _ = '_'}. </pre>
+    </section>
+
+    <section>
+      <title>Deleting an element</title>
+      <p>The delete operation is considered
+        successful if the element was not present in the table. Hence
+        all attempts to check that the element is present in the
+        Ets/Mnesia table before deletion are unnecessary. Here follows
+        an example for Ets tables.</p>
+      <p><em>DO</em></p>
+      <pre>
+...
+ets:delete(Tab, Key),
+...</pre>
+      <p><em>DO NOT</em></p>
+      <pre>
+...
+case ets:lookup(Tab, Key) of
+    [] ->
+        ok;
+    [_|_] ->
+        ets:delete(Tab, Key)
+end,
+...</pre>
+    </section>
+
+    <section>
+      <title>Data fetching</title>
+      <p>Do not fetch data that you already have! Consider that you
+        have a module that handles the abstract data type Person. You
+        export the interface function <c>print_person/1</c> that uses the internal functions
+        <c>print_name/1</c>, <c>print_age/1</c>, <c>print_occupation/1</c>.</p>
+      <note>
+        <p>If the functions <c>print_name/1</c> and so on, had been interface
+          functions the matter comes in to a whole new light, as you
+          do not want the user of the interface to know about the
+          internal data representation. </p>
+      </note>
+      <p><em>DO</em></p>
+      <code type="erl">
+%%% Interface function
+print_person(PersonId) ->
+    %% Look up the person in the named table person,
+    case ets:lookup(person, PersonId) of
+        [Person] ->
+            print_name(Person),
+            print_age(Person),
+            print_occupation(Person);
+        [] ->
+            io:format("No person with ID = ~p~n", [PersonID])
+    end.
+
+%%% Internal functions
+print_name(Person) -> 
+    io:format("No person ~p~n", [Person#person.name]).
+                      
+print_age(Person) -> 
+    io:format("No person ~p~n", [Person#person.age]).
+
+print_occupation(Person) -> 
+    io:format("No person ~p~n", [Person#person.occupation]).</code>
+      <p><em>DO NOT</em></p>
+      <code type="erl">
+%%% Interface function
+print_person(PersonId) ->
+    %% Look up the person in the named table person,
+    case ets:lookup(person, PersonId) of
+        [Person] ->
+            print_name(PersonID),
+            print_age(PersonID),
+            print_occupation(PersonID);
+        [] ->
+            io:format("No person with ID = ~p~n", [PersonID])
+    end.
+
+%%% Internal functionss
+print_name(PersonID) -> 
+    [Person] = ets:lookup(person, PersonId),
+    io:format("No person ~p~n", [Person#person.name]).
+
+print_age(PersonID) -> 
+    [Person] = ets:lookup(person, PersonId),
+    io:format("No person ~p~n", [Person#person.age]).
+
+print_occupation(PersonID) -> 
+    [Person] = ets:lookup(person, PersonId),
+    io:format("No person ~p~n", [Person#person.occupation]).</code>
+    </section>
+
+    <section>
+      <title>Non-persistent data storage </title>
+      <p>For non-persistent database storage, prefer Ets tables over
+        Mnesia local_content tables. Even the Mnesia <c>dirty_write</c>
+        operations carry a fixed overhead compared to Ets writes.
+        Mnesia must check if the table is replicated or has indices,
+        this involves at least one Ets lookup for each
+        <c>dirty_write</c>. Thus, Ets writes will always be faster than
+        Mnesia writes.</p>
+    </section>
+
+    <section>
+      <title>tab2list</title>
+      <p>Assume we have an Ets-table, which uses <c>idno</c> as key,
+        and contains:</p>
+      <pre>
+[#person{idno = 1, name = "Adam",  age = 31, occupation = "mailman"},
+ #person{idno = 2, name = "Bryan", age = 31, occupation = "cashier"},
+ #person{idno = 3, name = "Bryan", age = 35, occupation = "banker"},
+ #person{idno = 4, name = "Carl",  age = 25, occupation = "mailman"}]</pre>
+      <p>If we <em>must</em> return all data stored in the Ets-table we
+        can use <c>ets:tab2list/1</c>.  However, usually we are only
+        interested in a subset of the information in which case
+        <c>ets:tab2list/1</c> is expensive. If we only want to extract
+        one field from each record, e.g., the age of every person, we
+        should use:</p>
+      <p><em>DO</em></p>
+      <pre>
+...
+ets:select(Tab,[{ #person{idno='_', 
+                          name='_', 
+                          age='$1', 
+                          occupation = '_'},
+                [],
+                ['$1']}]),
+...</pre>
+      <p><em>DO NOT</em></p>
+      <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:map(fun(X) -> X#person.age end, TabList),
+...</pre>
+      <p>If we are only interested in the age of all persons named
+        Bryan, we should:</p>
+      <p><em>DO</em></p>
+      <pre>
+...
+ets:select(Tab,[{ #person{idno='_', 
+                          name="Bryan", 
+                          age='$1', 
+                          occupation = '_'},
+                [],
+                ['$1']}]),
+...</pre>
+      <p><em>DO NOT</em></p>
+      <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:foldl(fun(X, Acc) -> case X#person.name of
+                                "Bryan" ->
+                                    [X#person.age|Acc];
+                                 _ ->
+                                     Acc
+                           end
+             end, [], TabList),
+...</pre>
+      <p><em>REALLY DO NOT</em></p>
+      <pre>
+...
+TabList = ets:tab2list(Tab),
+BryanList = lists:filter(fun(X) -> X#person.name == "Bryan" end,
+                         TabList),
+lists:map(fun(X) -> X#person.age end, BryanList),
+...</pre>
+      <p>If we need all information stored in the Ets table about
+        persons named Bryan we should:</p>
+      <p><em>DO</em></p>
+      <pre>
+...
+ets:select(Tab, [{#person{idno='_', 
+                          name="Bryan", 
+                          age='_', 
+                          occupation = '_'}, [], ['$_']}]),
+...</pre>
+      <p><em>DO NOT</em></p>
+      <pre>
+...
+TabList = ets:tab2list(Tab),
+lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList),
+...</pre>
+    </section>
+
+    <section>
+      <title>Ordered_set tables</title>
+      <p>If the data in the table should be accessed so that the order
+        of the keys in the table is significant, the table type
+        <c>ordered_set</c> could be used instead of the more usual
+        <c>set</c> table type. An <c>ordered_set</c> is always
+        traversed in Erlang term order with regard to the key field
+        so that return values from functions such as <c>select</c>,
+        <c>match_object</c>, and <c>foldl</c> are ordered by the key
+        values. Traversing an <c>ordered_set</c> with the <c>first</c> and
+        <c>next</c> operations also returns the keys ordered.</p>
+      <note>
+        <p>An <c>ordered_set</c> only guarantees that
+          objects are processed in <em>key</em> order. Results from functions as 
+          <c>ets:select/2</c> appear in the <em>key</em> order even if
+          the key is not included in the result.</p>
+      </note>
+    </section>
+  </section>
+
+  <section>
+    <title>Ets specific</title>
+
+    <section>
+      <title>Utilizing the keys of the Ets table</title>
+      <p>An Ets table is a single key table (either a hash table or a
+        tree ordered by the key) and should be used as one. In other
+        words, use the key to look up things whenever possible. A
+        lookup by a known key in a set Ets table is constant and for a
+        ordered_set Ets table it is O(logN). A key lookup is always
+        preferable to a call where the whole table has to be
+        scanned. In the examples above, the field <c>idno</c> is the
+        key of the table and all lookups where only the name is known
+        will result in a complete scan of the (possibly large) table
+        for a matching result.</p>
+      <p>A simple solution would be to use the <c>name</c> field as
+        the key instead of the <c>idno</c> field, but that would cause
+        problems if the names were not unique. A more general solution
+        would be create a second table with name as key and idno as
+        data, i.e. to index (invert) the table with regards to the
+        <c>name</c> field. The second table would of course have to be
+        kept consistent with the master table. Mnesia could do this
+        for you, but a home brew index table could be very efficient
+        compared to the overhead involved in using Mnesia.</p>
+      <p>An index table for the table in the previous examples would
+        have to be a bag (as keys would appear more than once) and could
+        have the following contents:</p>
+      <pre>
+ 
+[#index_entry{name="Adam", idno=1},
+ #index_entry{name="Bryan", idno=2},
+ #index_entry{name="Bryan", idno=3},
+ #index_entry{name="Carl", idno=4}]</pre>
+      <p>Given this index table a lookup of the <c>age</c> fields for
+        all persons named "Bryan" could be done like this:</p>
+      <pre>
+...
+MatchingIDs = ets:lookup(IndexTable,"Bryan"),
+lists:map(fun(#index_entry{idno = ID}) ->
+                 [#person{age = Age}] = ets:lookup(PersonTable, ID),
+                 Age
+          end,
+          MatchingIDs),
+...</pre>
+      <p>Note that the code above never uses <c>ets:match/2</c> but
+        instead utilizes the <c>ets:lookup/2</c> call. The
+        <c>lists:map/2</c> call is only used to traverse the <c>idno</c>s
+        matching the name "Bryan" in the table; therefore the number of lookups
+        in the master table is minimized.</p>
+      <p>Keeping an index table introduces some overhead when
+        inserting records in the table, therefore the number of operations
+        gained from the table has to be weighted against the number of
+        operations inserting objects in the table. However, note that the gain when
+        the key can be used to lookup elements is significant.</p>
+    </section>
+  </section>
+
+  <section>
+    <title>Mnesia specific</title>
+
+    <section>
+      <title>Secondary index</title>
+      <p>If you frequently do a lookup on a field that is not the
+        key of the table, you will lose performance using
+        "mnesia:select/match_object" as this function will traverse the
+        whole table. You may create a secondary index instead and
+        use "mnesia:index_read" to get faster access, however this
+        will require more memory. Example:</p>
+      <pre>
+-record(person, {idno, name, age, occupation}).
+        ...
+{atomic, ok} = 
+mnesia:create_table(person, [{index,[#person.age]},
+                              {attributes,
+                                    record_info(fields, person)}]),
+{atomic, ok} = mnesia:add_table_index(person, age), 
+...
+
+PersonsAge42 =
+     mnesia:dirty_index_read(person, 42, #person.age),
+...</pre>
+    </section>
+
+    <section>
+      <title>Transactions </title>
+      <p>Transactions is a way to guarantee that the distributed
+        Mnesia database remains consistent, even when many different
+        processes update it in parallel. However if you have
+        real time requirements it is recommended to use dirty
+        operations instead of transactions. When using the dirty
+        operations you lose the consistency guarantee, this is usually
+        solved by only letting one process update the table. Other
+        processes have to send update requests to that process.</p>
+      <pre>
+...
+% Using transaction
+
+Fun = fun() ->
+          [mnesia:read({Table, Key}),
+           mnesia:read({Table2, Key2})]
+      end, 
+
+{atomic, [Result1, Result2]}  = mnesia:transaction(Fun),
+...
+
+% Same thing using dirty operations
+...
+
+Result1 = mnesia:dirty_read({Table, Key}),
+Result2 = mnesia:dirty_read({Table2, Key2}),
+...</pre>
+    </section>
+  </section>
+</chapter>
+
author	Erlang/OTP <[email protected]>	2009-11-20 14:54:40 +0000
committer	Erlang/OTP <[email protected]>	2009-11-20 14:54:40 +0000
commit	84adefa331c4159d432d22840663c38f155cd4c1 (patch)
tree	bff9a9c66adda4df2106dfd0e5c053ab182a12bd /system/doc/efficiency_guide/tablesDatabases.xml
download	otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.gz otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.bz2 otp-84adefa331c4159d432d22840663c38f155cd4c1.zip