diff options
author | Björn Gustavsson <[email protected]> | 2015-03-12 15:35:13 +0100 |
---|---|---|
committer | Björn Gustavsson <[email protected]> | 2015-03-12 15:38:25 +0100 |
commit | 6513fc5eb55b306e2b1088123498e6c50b9e7273 (patch) | |
tree | 986a133cb88ddeaeb0292f99af67e4d1015d1f62 | |
parent | 42a0387e886ddbf60b0e2cb977758e2ca74954ae (diff) | |
download | otp-6513fc5eb55b306e2b1088123498e6c50b9e7273.tar.gz otp-6513fc5eb55b306e2b1088123498e6c50b9e7273.tar.bz2 otp-6513fc5eb55b306e2b1088123498e6c50b9e7273.zip |
Update Efficiency Guide
Language cleaned up by the technical writers xsipewe and tmanevik
from Combitech. Proofreading and corrections by Björn Gustavsson.
-rw-r--r-- | system/doc/efficiency_guide/advanced.xml | 296 | ||||
-rw-r--r-- | system/doc/efficiency_guide/binaryhandling.xml | 359 | ||||
-rw-r--r-- | system/doc/efficiency_guide/commoncaveats.xml | 135 | ||||
-rw-r--r-- | system/doc/efficiency_guide/drivers.xml | 115 | ||||
-rw-r--r-- | system/doc/efficiency_guide/functions.xml | 120 | ||||
-rw-r--r-- | system/doc/efficiency_guide/introduction.xml | 28 | ||||
-rw-r--r-- | system/doc/efficiency_guide/listhandling.xml | 120 | ||||
-rw-r--r-- | system/doc/efficiency_guide/myths.xml | 128 | ||||
-rw-r--r-- | system/doc/efficiency_guide/processes.xml | 159 | ||||
-rw-r--r-- | system/doc/efficiency_guide/profiling.xml | 319 | ||||
-rw-r--r-- | system/doc/efficiency_guide/tablesDatabases.xml | 194 |
11 files changed, 1027 insertions, 946 deletions
diff --git a/system/doc/efficiency_guide/advanced.xml b/system/doc/efficiency_guide/advanced.xml index 51f1b2612c..3513a91e34 100644 --- a/system/doc/efficiency_guide/advanced.xml +++ b/system/doc/efficiency_guide/advanced.xml @@ -18,7 +18,6 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> <title>Advanced</title> @@ -31,175 +30,216 @@ <section> <title>Memory</title> - <p>A good start when programming efficiently is to have knowledge about + <p>A good start when programming efficiently is to know how much memory different data types and operations require. It is implementation-dependent how much memory the Erlang data types and - other items consume, but here are some figures for the - erts-5.2 system (OTP release R9B). (There have been no significant - changes in R13.)</p> + other items consume, but the following table shows some figures for + the <c>erts-5.2</c> system in R9B. There have been no significant + changes in R13.</p> - <p>The unit of measurement is memory words. There exists both a 32-bit - and a 64-bit implementation, and a word is therefore, 4 bytes or + <p>The unit of measurement is memory words. There exists both a + 32-bit and a 64-bit implementation. A word is therefore 4 bytes or 8 bytes, respectively.</p> <table> <row> - <cell align="center" valign="middle">Data type</cell> - <cell align="center" valign="middle">Memory size</cell> + <cell><em>Data Type</em></cell> + <cell><em>Memory Size</em></cell> </row> <row> - <cell align="left" valign="middle">Small integer</cell> - <cell align="left" valign="middle">1 word<br></br> -On 32-bit architectures: -134217729 < i < 134217728 (28 bits)<br></br> -On 64-bit architectures: -576460752303423489 < i < 576460752303423488 (60 bits)</cell> + <cell>Small integer</cell> + <cell>1 word.<br></br> + On 32-bit architectures: -134217729 < i < 134217728 + (28 bits).<br></br> + On 64-bit architectures: -576460752303423489 < i < + 576460752303423488 (60 bits).</cell> </row> <row> - <cell align="left" valign="middle">Big integer</cell> - <cell align="left" valign="middle">3..N words</cell> + <cell>Large integer</cell> + <cell>3..N words.</cell> </row> <row> - <cell align="left" valign="middle">Atom</cell> - <cell align="left" valign="middle">1 word. Note: an atom refers into - an atom table which also consumes memory. + <cell>Atom</cell> + <cell>1 word.<br></br> + An atom refers into an atom table, which also consumes memory. The atom text is stored once for each unique atom in this table. The atom table is <em>not</em> garbage-collected.</cell> </row> <row> - <cell align="left" valign="middle">Float</cell> - <cell align="left" valign="middle">On 32-bit architectures: 4 words <br></br> -On 64-bit architectures: 3 words</cell> + <cell>Float</cell> + <cell>On 32-bit architectures: 4 words.<br></br> + On 64-bit architectures: 3 words.</cell> </row> <row> - <cell align="left" valign="middle">Binary</cell> - <cell align="left" valign="middle">3..6 + data (can be shared)</cell> + <cell>Binary</cell> + <cell>3..6 words + data (can be shared).</cell> </row> <row> - <cell align="left" valign="middle">List</cell> - <cell align="left" valign="middle">1 word + 1 word per element + the size of each element</cell> + <cell>List</cell> + <cell>1 word + 1 word per element + the size of each element.</cell> </row> <row> - <cell align="left" valign="middle">String (is the same as a list of integers)</cell> - <cell align="left" valign="middle">1 word + 2 words per character</cell> + <cell>String (is the same as a list of integers)</cell> + <cell>1 word + 2 words per character.</cell> </row> <row> - <cell align="left" valign="middle">Tuple</cell> - <cell align="left" valign="middle">2 words + the size of each element</cell> + <cell>Tuple</cell> + <cell>2 words + the size of each element.</cell> </row> <row> - <cell align="left" valign="middle">Pid</cell> - <cell align="left" valign="middle">1 word for a process identifier from the current local node, and 5 words for a process identifier from another node. Note: a process identifier refers into a process table and a node table which also consumes memory.</cell> + <cell>Pid</cell> + <cell>1 word for a process identifier from the current local node + + 5 words for a process identifier from another node.<br></br> + A process identifier refers into a process table and a node table, + which also consumes memory.</cell> </row> <row> - <cell align="left" valign="middle">Port</cell> - <cell align="left" valign="middle">1 word for a port identifier from the current local node, and 5 words for a port identifier from another node. Note: a port identifier refers into a port table and a node table which also consumes memory.</cell> + <cell>Port</cell> + <cell>1 word for a port identifier from the current local node + + 5 words for a port identifier from another node.<br></br> + A port identifier refers into a port table and a node table, + which also consumes memory.</cell> </row> <row> - <cell align="left" valign="middle">Reference</cell> - <cell align="left" valign="middle">On 32-bit architectures: 5 words for a reference from the current local node, and 7 words for a reference from another node. <br></br> -On 64-bit architectures: 4 words for a reference from the current local node, and 6 words for a reference from another node. Note: a reference refers into a node table which also consumes memory.</cell> + <cell>Reference</cell> + <cell>On 32-bit architectures: 5 words for a reference from the + current local node + 7 words for a reference from another + node.<br></br> + On 64-bit architectures: 4 words for a reference from the current + local node + 6 words for a reference from another node.<br></br> + A reference refers into a node table, which also consumes + memory.</cell> </row> <row> - <cell align="left" valign="middle">Fun</cell> - <cell align="left" valign="middle">9..13 words + size of environment. Note: a fun refers into a fun table which also consumes memory.</cell> + <cell>Fun</cell> + <cell>9..13 words + the size of environment.<br></br> + A fun refers into a fun table, which also consumes memory.</cell> </row> <row> - <cell align="left" valign="middle">Ets table</cell> - <cell align="left" valign="middle">Initially 768 words + the size of each element (6 words + size of Erlang data). The table will grow when necessary.</cell> + <cell>Ets table</cell> + <cell>Initially 768 words + the size of each element (6 words + + the size of Erlang data). The table grows when necessary.</cell> </row> <row> - <cell align="left" valign="middle">Erlang process</cell> - <cell align="left" valign="middle">327 words when spawned including a heap of 233 words.</cell> + <cell>Erlang process</cell> + <cell>327 words when spawned, including a heap of 233 words.</cell> </row> - <tcaption>Memory size of different data types</tcaption> + <tcaption>Memory Size of Different Data Types</tcaption> </table> </section> <section> - <title>System limits</title> - <p>The Erlang language specification puts no limits on number of processes, - length of atoms etc., but for performance and memory saving reasons, - there will always be limits in a practical implementation of the Erlang - language and execution environment.</p> - <taglist> - <tag><em>Processes</em></tag> - <item> - <p>The maximum number of simultaneously alive Erlang processes is - by default 32768. This limit can be configured at startup, - for more information see the - <seealso marker="erts:erl#max_processes"><c>+P</c></seealso> - command line flag of - <seealso marker="erts:erl"><c>erl(1)</c></seealso>.</p> - </item> - <tag><em>Distributed nodes</em></tag> - <item> - <taglist> - <tag>Known nodes</tag> - <item> - <p>A remote node Y has to be known to node X if there exist - any pids, ports, references, or funs (Erlang data types) from Y - on X, or if X and Y are connected. The maximum number of remote - nodes simultaneously/ever known to a node is limited by the - <seealso marker="#atoms">maximum number of atoms</seealso> - available for node names. All data concerning remote nodes, - except for the node name atom, are garbage-collected.</p> - </item> - <tag>Connected nodes</tag> - <item>The maximum number of simultaneously connected nodes is limited by - either the maximum number of simultaneously known remote nodes, - <seealso marker="#ports">the maximum number of (Erlang) ports</seealso> - available, or - <seealso marker="#files_sockets">the maximum number of sockets</seealso> - available.</item> - </taglist> - </item> - <tag><em>Characters in an atom</em></tag> - <item>255</item> - <tag><em>Atoms </em></tag> - <item> <marker id="atoms"></marker> - By default, the maximum number of atoms is 1048576. - This limit can be raised or lowered using the <c>+t</c> option.</item> - <tag><em>Ets-tables</em></tag> - <item>The default is 1400, can be changed with the environment variable <c>ERL_MAX_ETS_TABLES</c>.</item> - <tag><em>Elements in a tuple</em></tag> - <item>The maximum number of elements in a tuple is 67108863 (26 bit unsigned integer). Other factors - such as the available memory can of course make it hard to create a tuple of that size. </item> - <tag><em>Size of binary</em></tag> - <item>In the 32-bit implementation of Erlang, 536870911 bytes is the - largest binary that can be constructed or matched using the bit syntax. - (In the 64-bit implementation, the maximum size is 2305843009213693951 bytes.) - If the limit is exceeded, bit syntax construction will fail with a - <c>system_limit</c> exception, while any attempt to match a binary that is - too large will fail. - This limit is enforced starting with the R11B-4 release; in earlier releases, - operations on too large binaries would in general either fail or give incorrect - results. - In future releases of Erlang/OTP, other operations that create binaries (such as - <c>list_to_binary/1</c>) will probably also enforce the same limit.</item> - <tag><em>Total amount of data allocated by an Erlang node</em></tag> - <item>The Erlang runtime system can use the complete 32 (or 64) bit address space, - but the operating system often limits a single process to use less than that.</item> - <tag><em>Length of a node name</em></tag> - <item>An Erlang node name has the form host@shortname or host@longname. The node name is - used as an atom within the system so the maximum size of 255 holds for the node name too.</item> - <tag><em>Open ports</em></tag> - <item> - <marker id="ports"></marker> - <p>The maximum number of simultaneously open Erlang ports is - often by default 16384. This limit can be configured at startup, - for more information see the - <seealso marker="erts:erl#max_ports"><c>+Q</c></seealso> - command line flag of - <seealso marker="erts:erl"><c>erl(1)</c></seealso>.</p> - </item> - <tag><em>Open files, and sockets</em></tag> - <item> <marker id="files_sockets"></marker> + <title>System Limits</title> + <p>The Erlang language specification puts no limits on the number of + processes, length of atoms, and so on. However, for performance and + memory saving reasons, there will always be limits in a practical + implementation of the Erlang language and execution environment.</p> - The maximum number of simultaneously open files and sockets - depend on - <seealso marker="#ports">the maximum number of Erlang ports</seealso> - available, and operating system specific settings and limits.</item> - <tag><em>Number of arguments to a function or fun</em></tag> - <item>255</item> - </taglist> + <table> + <row> + <cell>Processes</cell> + <cell>The maximum number of simultaneously alive Erlang processes + is by default 32,768. This limit can be configured at startup. + For more information, see the + <seealso marker="erts:erl#max_processes"><c>+P</c></seealso> + command-line flag in the + <seealso marker="erts:erl"><c>erl(1)</c></seealso> + manual page in <c>erts</c>.</cell> + </row> + <row> + <cell>Known nodes</cell> + <cell>A remote node Y must be known to node X if there exists + any pids, ports, references, or funs (Erlang data types) from Y + on X, or if X and Y are connected. The maximum number of remote + nodes simultaneously/ever known to a node is limited by the + <seealso marker="#atoms">maximum number of atoms</seealso> + available for node names. All data concerning remote nodes, + except for the node name atom, are garbage-collected.</cell> + </row> + <row> + <cell>Connected nodes</cell> + <cell>The maximum number of simultaneously connected nodes is + limited by either the maximum number of simultaneously known + remote nodes, + <seealso marker="#ports">the maximum number of (Erlang) ports</seealso> + available, or + <seealso marker="#files_sockets">the maximum number of sockets</seealso> + available.</cell> + </row> + <row> + <cell>Characters in an atom</cell> + <cell>255.</cell> + </row> + <row> + <cell><marker id="atoms"></marker>Atoms</cell> + <cell>By default, the maximum number of atoms is 1,048,576. This + limit can be raised or lowered using the <c>+t</c> option.</cell> + </row> + <row> + <cell>Ets tables</cell> + <cell>Default is 1400. It can be changed with the environment + variable <c>ERL_MAX_ETS_TABLES</c>.</cell> + </row> + <row> + <cell>Elements in a tuple</cell> + <cell>The maximum number of elements in a tuple is 67,108,863 + (26-bit unsigned integer). Clearly, other factors such as the + available memory can make it difficult to create a tuple of + that size.</cell> + </row> + <row> + <cell>Size of binary</cell> + <cell>In the 32-bit implementation of Erlang, 536,870,911 + bytes is the largest binary that can be constructed or matched + using the bit syntax. In the 64-bit implementation, the maximum + size is 2,305,843,009,213,693,951 bytes. If the limit is + exceeded, bit syntax construction fails with a + <c>system_limit</c> exception, while any attempt to match a + binary that is too large fails. This limit is enforced starting + in R11B-4.<br></br> + In earlier Erlang/OTP releases, operations on too large + binaries in general either fail or give incorrect results. In + future releases, other operations that create binaries (such as + <c>list_to_binary/1</c>) will probably also enforce the same + limit.</cell> + </row> + <row> + <cell>Total amount of data allocated by an Erlang node</cell> + <cell>The Erlang runtime system can use the complete 32-bit + (or 64-bit) address space, but the operating system often + limits a single process to use less than that.</cell> + </row> + <row> + <cell>Length of a node name</cell> + <cell>An Erlang node name has the form host@shortname or + host@longname. The node name is used as an atom within + the system, so the maximum size of 255 holds also for the + node name.</cell> + </row> + <row> + <cell><marker id="ports"></marker>Open ports</cell> + <cell>The maximum number of simultaneously open Erlang ports is + often by default 16,384. This limit can be configured at startup. + For more information, see the + <seealso marker="erts:erl#max_ports"><c>+Q</c></seealso> + command-line flag in the + <seealso marker="erts:erl"><c>erl(1)</c></seealso> manual page + in <c>erts</c>.</cell> + </row> + <row> + <cell><marker id="files_sockets"></marker>Open files and + sockets</cell> + <cell>The maximum number of simultaneously open files and + sockets depends on + <seealso marker="#ports">the maximum number of Erlang ports</seealso> + available, as well as on operating system-specific settings + and limits.</cell> + </row> + <row> + <cell>Number of arguments to a function or fun</cell> + <cell>255</cell> + </row> + <tcaption>System Limits</tcaption> + </table> </section> </chapter> diff --git a/system/doc/efficiency_guide/binaryhandling.xml b/system/doc/efficiency_guide/binaryhandling.xml index 4ba1378059..0ac1a7ee32 100644 --- a/system/doc/efficiency_guide/binaryhandling.xml +++ b/system/doc/efficiency_guide/binaryhandling.xml @@ -23,7 +23,7 @@ The Initial Developer of the Original Code is Ericsson AB. </legalnotice> - <title>Constructing and matching binaries</title> + <title>Constructing and Matching Binaries</title> <prepared>Bjorn Gustavsson</prepared> <docno></docno> <date>2007-10-12</date> @@ -31,10 +31,10 @@ <file>binaryhandling.xml</file> </header> - <p>In R12B, the most natural way to write binary construction and matching is now + <p>In R12B, the most natural way to construct and match binaries is significantly faster than in earlier releases.</p> - <p>To construct at binary, you can simply write</p> + <p>To construct a binary, you can simply write as follows:</p> <p><em>DO</em> (in R12B) / <em>REALLY DO NOT</em> (in earlier releases)</p> <code type="erl"><![CDATA[ @@ -46,13 +46,12 @@ my_list_to_binary([H|T], Acc) -> my_list_to_binary([], Acc) -> Acc.]]></code> - <p>In releases before R12B, <c>Acc</c> would be copied in every iteration. - In R12B, <c>Acc</c> will be copied only in the first iteration and extra - space will be allocated at the end of the copied binary. In the next iteration, - <c>H</c> will be written in to the extra space. When the extra space runs out, - the binary will be reallocated with more extra space.</p> - - <p>The extra space allocated (or reallocated) will be twice the size of the + <p>In releases before R12B, <c>Acc</c> is copied in every iteration. + In R12B, <c>Acc</c> is copied only in the first iteration and extra + space is allocated at the end of the copied binary. In the next iteration, + <c>H</c> is written into the extra space. When the extra space runs out, + the binary is reallocated with more extra space. The extra space allocated + (or reallocated) is twice the size of the existing binary data, or 256, whichever is larger.</p> <p>The most natural way to match binaries is now the fastest:</p> @@ -64,55 +63,79 @@ my_binary_to_list(<<H,T/binary>>) -> my_binary_to_list(<<>>) -> [].]]></code> <section> - <title>How binaries are implemented</title> + <title>How Binaries are Implemented</title> <p>Internally, binaries and bitstrings are implemented in the same way. - In this section, we will call them <em>binaries</em> since that is what + In this section, they are called <em>binaries</em> because that is what they are called in the emulator source code.</p> - <p>There are four types of binary objects internally. Two of them are - containers for binary data and two of them are merely references to - a part of a binary.</p> - - <p>The binary containers are called <em>refc binaries</em> - (short for <em>reference-counted binaries</em>) and <em>heap binaries</em>.</p> + <p>Four types of binary objects are available internally:</p> + <list type="bulleted"> + <item><p>Two are containers for binary data and are called:</p> + <list type="bulleted"> + <item><em>Refc binaries</em> (short for + <em>reference-counted binaries</em>)</item> + <item><em>Heap binaries</em></item> + </list></item> + <item><p>Two are merely references to a part of a binary and + are called:</p> + <list type="bulleted"> + <item><em>sub binaries</em></item> + <item><em>match contexts</em></item> + </list></item> + </list> - <p><marker id="refc_binary"></marker><em>Refc binaries</em> - consist of two parts: an object stored on - the process heap, called a <em>ProcBin</em>, and the binary object itself - stored outside all process heaps.</p> + <section> + <marker id="refc_binary"></marker> + <title>Refc Binaries</title> + <p>Refc binaries consist of two parts:</p> + <list type="bulleted"> + <item>An object stored on the process heap, called a + <em>ProcBin</em></item> + <item>The binary object itself, stored outside all process + heaps</item> + </list> <p>The binary object can be referenced by any number of ProcBins from any - number of processes; the object contains a reference counter to keep track + number of processes. The object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.</p> <p>All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.</p> + </section> - <p><marker id="heap_binary"></marker><em>Heap binaries</em> are small binaries, - up to 64 bytes, that are stored directly on the process heap. - They will be copied when the process - is garbage collected and when they are sent as a message. They don't + <section> + <marker id="heap_binary"></marker> + <title>Heap Binaries</title> + <p>Heap binaries are small binaries, up to 64 bytes, and are stored + directly on the process heap. They are copied when the process is + garbage-collected and when they are sent as a message. They do not require any special handling by the garbage collector.</p> + </section> - <p>There are two types of reference objects that can reference part of - a refc binary or heap binary. They are called <em>sub binaries</em> and - <em>match contexts</em>.</p> + <section> + <title>Sub Binaries</title> + <p>The reference objects <em>sub binaries</em> and + <em>match contexts</em> can reference part of + a refc binary or heap binary.</p> <p><marker id="sub_binary"></marker>A <em>sub binary</em> is created by <c>split_binary/2</c> and when a binary is matched out in a binary pattern. A sub binary is a reference - into a part of another binary (refc or heap binary, never into a another + into a part of another binary (refc or heap binary, but never into another sub binary). Therefore, matching out a binary is relatively cheap because the actual binary data is never copied.</p> + </section> - <p><marker id="match_context"></marker>A <em>match context</em> is - similar to a sub binary, but is optimized - for binary matching; for instance, it contains a direct pointer to the binary - data. For each field that is matched out of a binary, the position in the - match context will be incremented.</p> + <section> + <title>Match Context</title> + <marker id="match_context"></marker> + <p>A <em>match context</em> is similar to a sub binary, but is + optimized for binary matching. For example, it contains a direct + pointer to the binary data. For each field that is matched out of + a binary, the position in the match context is incremented.</p> <p>In R11B, a match context was only used during a binary matching operation.</p> @@ -122,27 +145,28 @@ my_binary_to_list(<<>>) -> [].]]></code> context and discard the sub binary. Instead of creating a sub binary, the match context is kept.</p> - <p>The compiler can only do this optimization if it can know for sure + <p>The compiler can only do this optimization if it knows that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.</p> + </section> </section> <section> - <title>Constructing binaries</title> - - <p>In R12B, appending to a binary or bitstring</p> + <title>Constructing Binaries</title> + <p>In R12B, appending to a binary or bitstring + is specially optimized by the <em>runtime system</em>:</p> <code type="erl"><![CDATA[ <<Binary/binary, ...>> <<Binary/bitstring, ...>>]]></code> - <p>is specially optimized by the <em>run-time system</em>. - Because the run-time system handles the optimization (instead of + <p>As the runtime system handles the optimization (instead of the compiler), there are very few circumstances in which the optimization - will not work.</p> + does not work.</p> - <p>To explain how it works, we will go through this code</p> + <p>To explain how it works, let us examine the following code line + by line:</p> <code type="erl"><![CDATA[ Bin0 = <<0>>, %% 1 @@ -152,81 +176,81 @@ Bin3 = <<Bin2/binary,7,8,9>>, %% 4 Bin4 = <<Bin1/binary,17>>, %% 5 !!! {Bin4,Bin3} %% 6]]></code> - <p>line by line.</p> - - <p>The first line (marked with the <c>%% 1</c> comment), assigns + <list type="bulleted"> + <item>Line 1 (marked with the <c>%% 1</c> comment), assigns a <seealso marker="#heap_binary">heap binary</seealso> to - the variable <c>Bin0</c>.</p> + the <c>Bin0</c> variable.</item> - <p>The second line is an append operation. Since <c>Bin0</c> + <item>Line 2 is an append operation. As <c>Bin0</c> has not been involved in an append operation, a new <seealso marker="#refc_binary">refc binary</seealso> - will be created and the contents of <c>Bin0</c> will be copied - into it. The <em>ProcBin</em> part of the refc binary will have + is created and the contents of <c>Bin0</c> is copied + into it. The <em>ProcBin</em> part of the refc binary has its size set to the size of the data stored in the binary, while - the binary object will have extra space allocated. - The size of the binary object will be either twice the + the binary object has extra space allocated. + The size of the binary object is either twice the size of <c>Bin0</c> or 256, whichever is larger. In this case - it will be 256.</p> + it is 256.</item> - <p>It gets more interesting in the third line. + <item>Line 3 is more interesting. <c>Bin1</c> <em>has</em> been used in an append operation, - and it has 255 bytes of unused storage at the end, so the three new bytes - will be stored there.</p> + and it has 255 bytes of unused storage at the end, so the 3 new + bytes are stored there.</item> - <p>Same thing in the fourth line. There are 252 bytes left, - so there is no problem storing another three bytes.</p> + <item>Line 4. The same applies here. There are 252 bytes left, + so there is no problem storing another 3 bytes.</item> - <p>But in the fifth line something <em>interesting</em> happens. - Note that we don't append to the previous result in <c>Bin3</c>, - but to <c>Bin1</c>. We expect that <c>Bin4</c> will be assigned - the value <c><<0,1,2,3,17>></c>. We also expect that + <item>Line 5. Here, something <em>interesting</em> happens. Notice + that the result is not appended to the previous result in <c>Bin3</c>, + but to <c>Bin1</c>. It is expected that <c>Bin4</c> will be assigned + the value <c><<0,1,2,3,17>></c>. It is also expected that <c>Bin3</c> will retain its value (<c><<0,1,2,3,4,5,6,7,8,9>></c>). - Clearly, the run-time system cannot write the byte <c>17</c> into the binary, + Clearly, the runtime system cannot write byte <c>17</c> into the binary, because that would change the value of <c>Bin3</c> to - <c><<0,1,2,3,4,17,6,7,8,9>></c>.</p> - - <p>What will happen?</p> + <c><<0,1,2,3,4,17,6,7,8,9>></c>.</item> + </list> - <p>The run-time system will see that <c>Bin1</c> is the result + <p>The runtime system sees that <c>Bin1</c> is the result from a previous append operation (not from the latest append operation), - so it will <em>copy</em> the contents of <c>Bin1</c> to a new binary - and reserve extra storage and so on. (We will not explain here how the - run-time system can know that it is not allowed to write into <c>Bin1</c>; + so it <em>copies</em> the contents of <c>Bin1</c> to a new binary, + reserve extra storage, and so on. (Here is not explained how the + runtime system can know that it is not allowed to write into <c>Bin1</c>; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily <c>erl_bits.c</c>.)</p> <section> - <title>Circumstances that force copying</title> + <title>Circumstances That Force Copying</title> <p>The optimization of the binary append operation requires that there is a <em>single</em> ProcBin and a <em>single reference</em> to the ProcBin for the binary. The reason is that the binary object can be - moved (reallocated) during an append operation, and when that happens + moved (reallocated) during an append operation, and when that happens, the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.</p> - <p>Therefore, certain operations on a binary will mark it so that + <p>Therefore, certain operations on a binary mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.</p> - <p>When appending to a binary</p> + <p>When appending to a binary as follows, only the binary returned + from the latest append operation will support further cheap append + operations:</p> <code type="erl"><![CDATA[ Bin = <<Bin0,...>>]]></code> - <p>only the binary returned from the latest append operation will - support further cheap append operations. In the code fragment above, + <p>In the code fragment in the beginning of this section, appending to <c>Bin</c> will be cheap, while appending to <c>Bin0</c> will force the creation of a new binary and copying of the contents of <c>Bin0</c>.</p> <p>If a binary is sent as a message to a process or port, the binary will be shrunk and any further append operation will copy the binary - data into a new binary. For instance, in the following code fragment</p> + data into a new binary. For example, in the following code fragment + <c>Bin1</c> will be copied in the third line:</p> <code type="erl"><![CDATA[ Bin1 = <<Bin0,...>>, @@ -234,12 +258,12 @@ PortOrPid ! Bin1, Bin = <<Bin1,...>> %% Bin1 will be COPIED ]]></code> - <p><c>Bin1</c> will be copied in the third line.</p> - - <p>The same thing happens if you insert a binary into an <em>ets</em> - table or send it to a port using <c>erlang:port_command/2</c> or pass it to + <p>The same happens if you insert a binary into an Ets + table, send it to a port using <c>erlang:port_command/2</c>, or + pass it to <seealso marker="erts:erl_nif#enif_inspect_binary">enif_inspect_binary</seealso> in a NIF.</p> + <p>Matching a binary will also cause it to shrink and the next append operation will copy the binary data:</p> @@ -249,22 +273,23 @@ Bin1 = <<Bin0,...>>, Bin = <<Bin1,...>> %% Bin1 will be COPIED ]]></code> - <p>The reason is that a <seealso marker="#match_context">match context</seealso> + <p>The reason is that a + <seealso marker="#match_context">match context</seealso> contains a direct pointer to the binary data.</p> - <p>If a process simply keeps binaries (either in "loop data" or in the process - dictionary), the garbage collector may eventually shrink the binaries. - If only one such binary is kept, it will not be shrunk. If the process later - appends to a binary that has been shrunk, the binary object will be reallocated - to make place for the data to be appended.</p> + <p>If a process simply keeps binaries (either in "loop data" or in the + process + dictionary), the garbage collector can eventually shrink the binaries. + If only one such binary is kept, it will not be shrunk. If the process + later appends to a binary that has been shrunk, the binary object will + be reallocated to make place for the data to be appended.</p> </section> - </section> <section> - <title>Matching binaries</title> + <title>Matching Binaries</title> - <p>We will revisit the example shown earlier</p> + <p>Let us revisit the example in the beginning of the previous section:</p> <p><em>DO</em> (in R12B)</p> <code type="erl"><![CDATA[ @@ -272,36 +297,35 @@ my_binary_to_list(<<H,T/binary>>) -> [H|my_binary_to_list(T)]; my_binary_to_list(<<>>) -> [].]]></code> - <p>too see what is happening under the hood.</p> - - <p>The very first time <c>my_binary_to_list/1</c> is called, + <p>The first time <c>my_binary_to_list/1</c> is called, a <seealso marker="#match_context">match context</seealso> - will be created. The match context will point to the first - byte of the binary. One byte will be matched out and the match context - will be updated to point to the second byte in the binary.</p> + is created. The match context points to the first + byte of the binary. 1 byte is matched out and the match context + is updated to point to the second byte in the binary.</p> - <p>In R11B, at this point a <seealso marker="#sub_binary">sub binary</seealso> + <p>In R11B, at this point a + <seealso marker="#sub_binary">sub binary</seealso> would be created. In R12B, the compiler sees that there is no point in creating a sub binary, because there will soon be a call to a function (in this case, - to <c>my_binary_to_list/1</c> itself) that will immediately + to <c>my_binary_to_list/1</c> itself) that immediately will create a new match context and discard the sub binary.</p> - <p>Therefore, in R12B, <c>my_binary_to_list/1</c> will call itself + <p>Therefore, in R12B, <c>my_binary_to_list/1</c> calls itself with the match context instead of with a sub binary. The instruction - that initializes the matching operation will basically do nothing + that initializes the matching operation basically does nothing when it sees that it was passed a match context instead of a binary.</p> <p>When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next - garbage collection, since there is no longer any reference to it).</p> + garbage collection, as there is no longer any reference to it).</p> <p>To summarize, <c>my_binary_to_list/1</c> in R12B only needs to create <em>one</em> match context and no sub binaries. In R11B, if the binary contains <em>N</em> bytes, <em>N+1</em> match contexts and <em>N</em> - sub binaries will be created.</p> + sub binaries are created.</p> - <p>In R11B, the fastest way to match binaries is:</p> + <p>In R11B, the fastest way to match binaries is as follows:</p> <p><em>DO NOT</em> (in R12B)</p> <code type="erl"><![CDATA[ @@ -317,13 +341,14 @@ my_complicated_binary_to_list(Bin, Skip) -> end.]]></code> <p>This function cleverly avoids building sub binaries, but it cannot - avoid building a match context in each recursion step. Therefore, in both R11B and R12B, + avoid building a match context in each recursion step. + Therefore, in both R11B and R12B, <c>my_complicated_binary_to_list/1</c> builds <em>N+1</em> match - contexts. (In a future release, the compiler might be able to generate code - that reuses the match context, but don't hold your breath.)</p> + contexts. (In a future Erlang/OTP release, the compiler might be able + to generate code that reuses the match context.)</p> - <p>Returning to <c>my_binary_to_list/1</c>, note that the match context was - discarded when the entire binary had been traversed. What happens if + <p>Returning to <c>my_binary_to_list/1</c>, notice that the match context + was discarded when the entire binary had been traversed. What happens if the iteration stops before it has reached the end of the binary? Will the optimization still work?</p> @@ -336,29 +361,23 @@ after_zero(<<>>) -> <<>>. ]]></code> - <p>Yes, it will. The compiler will remove the building of the sub binary in the - second clause</p> + <p>Yes, it will. The compiler will remove the building of the sub binary in + the second clause:</p> <code type="erl"><![CDATA[ -. -. -. +... after_zero(<<_,T/binary>>) -> after_zero(T); -. -. -.]]></code> +...]]></code> - <p>but will generate code that builds a sub binary in the first clause</p> + <p>But it will generate code that builds a sub binary in the first clause:</p> <code type="erl"><![CDATA[ after_zero(<<0,T/binary>>) -> T; -. -. -.]]></code> +...]]></code> - <p>Therefore, <c>after_zero/1</c> will build one match context and one sub binary + <p>Therefore, <c>after_zero/1</c> builds one match context and one sub binary (assuming it is passed a binary that contains a zero byte).</p> <p>Code like the following will also be optimized:</p> @@ -371,12 +390,14 @@ all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) -> all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) -> all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]></code> - <p>The compiler will remove building of sub binaries in the second and third clauses, - and it will add an instruction to the first clause that will convert <c>Buffer</c> - from a match context to a sub binary (or do nothing if <c>Buffer</c> already is a binary).</p> + <p>The compiler removes building of sub binaries in the second and third + clauses, and it adds an instruction to the first clause that converts + <c>Buffer</c> from a match context to a sub binary (or do nothing if + <c>Buffer</c> is a binary already).</p> - <p>Before you begin to think that the compiler can optimize any binary patterns, - here is a function that the compiler (currently, at least) is not able to optimize:</p> + <p>Before you begin to think that the compiler can optimize any binary + patterns, the following function cannot be optimized by the compiler + (currently, at least):</p> <code type="erl"><![CDATA[ non_opt_eq([H|T1], <<H,T2/binary>>) -> @@ -386,43 +407,43 @@ non_opt_eq([_|_], <<_,_/binary>>) -> non_opt_eq([], <<>>) -> true.]]></code> - <p>It was briefly mentioned earlier that the compiler can only delay creation of - sub binaries if it can be sure that the binary will not be shared. In this case, - the compiler cannot be sure.</p> + <p>It was mentioned earlier that the compiler can only delay creation of + sub binaries if it knows that the binary will not be shared. In this case, + the compiler cannot know.</p> - <p>We will soon show how to rewrite <c>non_opt_eq/2</c> so that the delayed sub binary - optimization can be applied, and more importantly, we will show how you can find out - whether your code can be optimized.</p> + <p>Soon it is shown how to rewrite <c>non_opt_eq/2</c> so that the delayed + sub binary optimization can be applied, and more importantly, it is shown + how you can find out whether your code can be optimized.</p> <section> - <title>The bin_opt_info option</title> + <title>Option bin_opt_info</title> <p>Use the <c>bin_opt_info</c> option to have the compiler print a lot of - information about binary optimizations. It can be given either to the compiler or - <c>erlc</c></p> + information about binary optimizations. It can be given either to the + compiler or <c>erlc</c>:</p> <code type="erl"><![CDATA[ erlc +bin_opt_info Mod.erl]]></code> - <p>or passed via an environment variable</p> + <p>or passed through an environment variable:</p> <code type="erl"><![CDATA[ export ERL_COMPILER_OPTIONS=bin_opt_info]]></code> - <p>Note that the <c>bin_opt_info</c> is not meant to be a permanent option added - to your <c>Makefile</c>s, because it is not possible to eliminate all messages that - it generates. Therefore, passing the option through the environment is in most cases - the most practical approach.</p> + <p>Notice that the <c>bin_opt_info</c> is not meant to be a permanent + option added to your <c>Makefile</c>s, because all messages that it + generates cannot be eliminated. Therefore, passing the option through + the environment is in most cases the most practical approach.</p> - <p>The warnings will look like this:</p> + <p>The warnings look as follows:</p> <code type="erl"><![CDATA[ ./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned ./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed]]></code> - <p>To make it clearer exactly what code the warnings refer to, - in the examples that follow, the warnings are inserted as comments - after the clause they refer to:</p> + <p>To make it clearer exactly what code the warnings refer to, the + warnings in the following examples are inserted as comments + after the clause they refer to, for example:</p> <code type="erl"><![CDATA[ after_zero(<<0,T/binary>>) -> @@ -434,12 +455,12 @@ after_zero(<<_,T/binary>>) -> after_zero(<<>>) -> <<>>.]]></code> - <p>The warning for the first clause tells us that it is not possible to - delay the creation of a sub binary, because it will be returned. - The warning for the second clause tells us that a sub binary will not be + <p>The warning for the first clause says that the creation of a sub + binary cannot be delayed, because it will be returned. + The warning for the second clause says that a sub binary will not be created (yet).</p> - <p>It is time to revisit the earlier example of the code that could not + <p>Let us revisit the earlier example of the code that could not be optimized and find out why:</p> <code type="erl"><![CDATA[ @@ -456,16 +477,16 @@ non_opt_eq([_|_], <<_,_/binary>>) -> non_opt_eq([], <<>>) -> true.]]></code> - <p>The compiler emitted two warnings. The <c>INFO</c> warning refers to the function - <c>non_opt_eq/2</c> as a callee, indicating that any functions that call <c>non_opt_eq/2</c> - will not be able to make delayed sub binary optimization. - There is also a suggestion to change argument order. - The second warning (that happens to refer to the same line) refers to the construction of - the sub binary itself.</p> + <p>The compiler emitted two warnings. The <c>INFO</c> warning refers + to the function <c>non_opt_eq/2</c> as a callee, indicating that any + function that call <c>non_opt_eq/2</c> cannot make delayed sub binary + optimization. There is also a suggestion to change argument order. + The second warning (that happens to refer to the same line) refers to + the construction of the sub binary itself.</p> - <p>We will soon show another example that should make the distinction between <c>INFO</c> - and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but first we will heed the suggestion - to change argument order:</p> + <p>Soon another example will show the difference between the + <c>INFO</c> and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but + let us first follow the suggestion to change argument order:</p> <code type="erl"><![CDATA[ opt_eq(<<H,T1/binary>>, [H|T2]) -> @@ -485,15 +506,13 @@ match_body([0|_], <<H,_/binary>>) -> %% sub binary optimization; %% SUGGEST changing argument order done; -. -. -.]]></code> +...]]></code> <p>The warning means that <em>if</em> there is a call to <c>match_body/2</c> (from another clause in <c>match_body/2</c> or another function), the - delayed sub binary optimization will not be possible. There will be additional - warnings for any place where a sub binary is matched out at the end of and - passed as the second argument to <c>match_body/2</c>. For instance:</p> + delayed sub binary optimization will not be possible. More warnings will + occur for any place where a sub binary is matched out at the end of and + passed as the second argument to <c>match_body/2</c>, for example:</p> <code type="erl"><![CDATA[ match_head(List, <<_:10,Data/binary>>) -> @@ -504,10 +523,10 @@ match_head(List, <<_:10,Data/binary>>) -> </section> <section> - <title>Unused variables</title> + <title>Unused Variables</title> - <p>The compiler itself figures out if a variable is unused. The same - code is generated for each of the following functions</p> + <p>The compiler figures out if a variable is unused. The same + code is generated for each of the following functions:</p> <code type="erl"><![CDATA[ count1(<<_,T/binary>>, Count) -> count1(T, Count+1); @@ -519,11 +538,9 @@ count2(<<>>, Count) -> Count. count3(<<_H,T/binary>>, Count) -> count3(T, Count+1); count3(<<>>, Count) -> Count.]]></code> - <p>In each iteration, the first 8 bits in the binary will be skipped, not matched out.</p> - + <p>In each iteration, the first 8 bits in the binary will be skipped, + not matched out.</p> </section> - </section> - </chapter> diff --git a/system/doc/efficiency_guide/commoncaveats.xml b/system/doc/efficiency_guide/commoncaveats.xml index 551b0a03e6..71991d342f 100644 --- a/system/doc/efficiency_guide/commoncaveats.xml +++ b/system/doc/efficiency_guide/commoncaveats.xml @@ -18,7 +18,6 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> <title>Common Caveats</title> @@ -29,49 +28,50 @@ <file>commoncaveats.xml</file> </header> - <p>Here we list a few modules and BIFs to watch out for, and not only + <p>This section lists a few modules and BIFs to watch out for, not only from a performance point of view.</p> <section> - <title>The timer module</title> + <title>Timer Module</title> <p>Creating timers using <seealso marker="erts:erlang#erlang:send_after/3">erlang:send_after/3</seealso> - and <seealso marker="erts:erlang#erlang:start_timer/3">erlang:start_timer/3</seealso> + and + <seealso marker="erts:erlang#erlang:start_timer/3">erlang:start_timer/3</seealso> +, is much more efficient than using the timers provided by the - <seealso marker="stdlib:timer">timer</seealso> module. The - <c>timer</c> module uses a separate process to manage the timers, - and that process can easily become overloaded if many processes + <seealso marker="stdlib:timer">timer</seealso> module in STDLIB. + The <c>timer</c> module uses a separate process to manage the timers. + That process can easily become overloaded if many processes create and cancel timers frequently (especially when using the SMP emulator).</p> - <p>The functions in the <c>timer</c> module that do not manage timers (such as - <c>timer:tc/3</c> or <c>timer:sleep/1</c>), do not call the timer-server process - and are therefore harmless.</p> + <p>The functions in the <c>timer</c> module that do not manage timers + (such as <c>timer:tc/3</c> or <c>timer:sleep/1</c>), do not call the + timer-server process and are therefore harmless.</p> </section> <section> <title>list_to_atom/1</title> - <p>Atoms are not garbage-collected. Once an atom is created, it will never - be removed. The emulator will terminate if the limit for the number - of atoms (1048576 by default) is reached.</p> + <p>Atoms are not garbage-collected. Once an atom is created, it is never + removed. The emulator terminates if the limit for the number + of atoms (1,048,576 by default) is reached.</p> - <p>Therefore, converting arbitrary input strings to atoms could be - dangerous in a system that will run continuously. - If only certain well-defined atoms are allowed as input, you can use + <p>Therefore, converting arbitrary input strings to atoms can be + dangerous in a system that runs continuously. + If only certain well-defined atoms are allowed as input, <seealso marker="erts:erlang#list_to_existing_atom/1">list_to_existing_atom/1</seealso> + can be used to to guard against a denial-of-service attack. (All atoms that are allowed - must have been created earlier, for instance by simply using all of them + must have been created earlier, for example, by simply using all of them in a module and loading that module.)</p> <p>Using <c>list_to_atom/1</c> to construct an atom that is passed to - <c>apply/3</c> like this</p> - + <c>apply/3</c> as follows, is quite expensive and not recommended + in time-critical code:</p> <code type="erl"> -apply(list_to_atom("some_prefix"++Var), foo, Args)</code> - - <p>is quite expensive and is not recommended in time-critical code.</p> +apply(list_to_atom("some_prefix"++Var), foo, Args)</code> </section> <section> @@ -81,25 +81,25 @@ apply(list_to_atom("some_prefix"++Var), foo, Args)</code> length of the list, as opposed to <c>tuple_size/1</c>, <c>byte_size/1</c>, and <c>bit_size/1</c>, which all execute in constant time.</p> - <p>Normally you don't have to worry about the speed of <c>length/1</c>, - because it is efficiently implemented in C. In time critical-code, though, - you might want to avoid it if the input list could potentially be very long.</p> + <p>Normally, there is no need to worry about the speed of <c>length/1</c>, + because it is efficiently implemented in C. In time-critical code, + you might want to avoid it if the input list could potentially be very + long.</p> <p>Some uses of <c>length/1</c> can be replaced by matching. - For instance, this code</p> - + For example, the following code:</p> <code type="erl"> foo(L) when length(L) >= 3 -> ...</code> - <p>can be rewritten to</p> + <p>can be rewritten to:</p> <code type="erl"> foo([_,_,_|_]=L) -> ...</code> - <p>(One slight difference is that <c>length(L)</c> will fail if the <c>L</c> - is an improper list, while the pattern in the second code fragment will - accept an improper list.)</p> + <p>One slight difference is that <c>length(L)</c> fails if <c>L</c> + is an improper list, while the pattern in the second code fragment + accepts an improper list.</p> </section> <section> @@ -107,50 +107,49 @@ foo([_,_,_|_]=L) -> <p><seealso marker="erts:erlang#setelement/3">setelement/3</seealso> copies the tuple it modifies. Therefore, updating a tuple in a loop - using <c>setelement/3</c> will create a new copy of the tuple every time.</p> + using <c>setelement/3</c> creates a new copy of the tuple every time.</p> <p>There is one exception to the rule that the tuple is copied. If the compiler clearly can see that destructively updating the tuple would - give exactly the same result as if the tuple was copied, the call to - <c>setelement/3</c> will be replaced with a special destructive setelement - instruction. In the following code sequence</p> - + give the same result as if the tuple was copied, the call to + <c>setelement/3</c> is replaced with a special destructive <c>setelement</c> + instruction. In the following code sequence, the first <c>setelement/3</c> + call copies the tuple and modifies the ninth element:</p> <code type="erl"> multiple_setelement(T0) -> T1 = setelement(9, T0, bar), T2 = setelement(7, T1, foobar), setelement(5, T2, new_value).</code> - <p>the first <c>setelement/3</c> call will copy the tuple and modify the - ninth element. The two following <c>setelement/3</c> calls will modify + <p>The two following <c>setelement/3</c> calls modify the tuple in place.</p> - <p>For the optimization to be applied, <em>all</em> of the followings conditions + <p>For the optimization to be applied, <em>all</em> the followings conditions must be true:</p> <list type="bulleted"> <item>The indices must be integer literals, not variables or expressions.</item> <item>The indices must be given in descending order.</item> - <item>There must be no calls to other function in between the calls to + <item>There must be no calls to another function in between the calls to <c>setelement/3</c>.</item> <item>The tuple returned from one <c>setelement/3</c> call must only be used in the subsequent call to <c>setelement/3</c>.</item> </list> - <p>If it is not possible to structure the code as in the <c>multiple_setelement/1</c> + <p>If the code cannot be structured as in the <c>multiple_setelement/1</c> example, the best way to modify multiple elements in a large tuple is to - convert the tuple to a list, modify the list, and convert the list back to + convert the tuple to a list, modify the list, and convert it back to a tuple.</p> </section> <section> <title>size/1</title> - <p><c>size/1</c> returns the size for both tuples and binary.</p> + <p><c>size/1</c> returns the size for both tuples and binaries.</p> - <p>Using the new BIFs <c>tuple_size/1</c> and <c>byte_size/1</c> introduced - in R12B gives the compiler and run-time system more opportunities for - optimization. A further advantage is that the new BIFs could help Dialyzer + <p>Using the new BIFs <c>tuple_size/1</c> and <c>byte_size/1</c>, introduced + in R12B, gives the compiler and the runtime system more opportunities for + optimization. Another advantage is that the new BIFs can help Dialyzer to find more bugs in your program.</p> </section> @@ -159,22 +158,21 @@ multiple_setelement(T0) -> <p>It is usually more efficient to split a binary using matching instead of calling the <c>split_binary/2</c> function. Furthermore, mixing bit syntax matching and <c>split_binary/2</c> - may prevent some optimizations of bit syntax matching.</p> + can prevent some optimizations of bit syntax matching.</p> <p><em>DO</em></p> <code type="none"><![CDATA[ <<Bin1:Num/binary,Bin2/binary>> = Bin,]]></code> <p><em>DO NOT</em></p> <code type="none"> - {Bin1,Bin2} = split_binary(Bin, Num) - </code> + {Bin1,Bin2} = split_binary(Bin, Num)</code> </section> <section> - <title>The '--' operator</title> - <p>Note that the '<c>--</c>' operator has a complexity - proportional to the product of the length of its operands, - meaning that it will be very slow if both of its operands + <title>Operator "--"</title> + <p>The "<c>--</c>" operator has a complexity + proportional to the product of the length of its operands. + This means that the operator is very slow if both of its operands are long lists:</p> <p><em>DO NOT</em></p> @@ -182,42 +180,39 @@ multiple_setelement(T0) -> HugeList1 -- HugeList2]]></code> <p>Instead use the <seealso marker="stdlib:ordsets">ordsets</seealso> - module:</p> + module in STDLIB:</p> <p><em>DO</em></p> <code type="none"> HugeSet1 = ordsets:from_list(HugeList1), HugeSet2 = ordsets:from_list(HugeList2), - ordsets:subtract(HugeSet1, HugeSet2) - </code> + ordsets:subtract(HugeSet1, HugeSet2)</code> - <p>Obviously, that code will not work if the original order + <p>Obviously, that code does not work if the original order of the list is important. If the order of the list must be - preserved, do like this:</p> + preserved, do as follows:</p> <p><em>DO</em></p> <code type="none"><![CDATA[ Set = gb_sets:from_list(HugeList2), [E || E <- HugeList1, not gb_sets:is_element(E, Set)]]]></code> - <p>Subtle note 1: This code behaves differently from '<c>--</c>' - if the lists contain duplicate elements. (One occurrence - of an element in HugeList2 will remove <em>all</em> + <note><p>This code behaves differently from "<c>--</c>" + if the lists contain duplicate elements (one occurrence + of an element in HugeList2 removes <em>all</em> occurrences in HugeList1.)</p> + <p>Also, this code compares lists elements using the + "<c>==</c>" operator, while "<c>--</c>" uses the "<c>=:=</c>" operator. + If that difference is important, <c>sets</c> can be used instead of + <c>gb_sets</c>, but <c>sets:from_list/1</c> is much + slower than <c>gb_sets:from_list/1</c> for long lists.</p></note> - <p>Subtle note 2: This code compares lists elements using the - '<c>==</c>' operator, while '<c>--</c>' uses the '<c>=:=</c>'. If - that difference is important, <c>sets</c> can be used instead of - <c>gb_sets</c>, but note that <c>sets:from_list/1</c> is much - slower than <c>gb_sets:from_list/1</c> for long lists.</p> - - <p>Using the '<c>--</c>' operator to delete an element + <p>Using the "<c>--</c>" operator to delete an element from a list is not a performance problem:</p> <p><em>OK</em></p> <code type="none"> - HugeList1 -- [Element] - </code> + HugeList1 -- [Element]</code> </section> diff --git a/system/doc/efficiency_guide/drivers.xml b/system/doc/efficiency_guide/drivers.xml index dfc49bdf21..33d6333e7d 100644 --- a/system/doc/efficiency_guide/drivers.xml +++ b/system/doc/efficiency_guide/drivers.xml @@ -18,7 +18,6 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> <title>Drivers</title> @@ -29,26 +28,26 @@ <file>drivers.xml</file> </header> - <p>This chapter provides a (very) brief overview on how to write efficient - drivers. It is assumed that you already have a good understanding of + <p>This section provides a brief overview on how to write efficient drivers.</p> + <p>It is assumed that you have a good understanding of drivers.</p> <section> - <title>Drivers and concurrency</title> + <title>Drivers and Concurrency</title> - <p>The run-time system will always take a lock before running + <p>The runtime system always takes a lock before running any code in a driver.</p> - <p>By default, that lock will be at the driver level, meaning that + <p>By default, that lock is at the driver level, that is, if several ports have been opened to the same driver, only code for one port at the same time can be running.</p> - <p>A driver can be configured to instead have one lock for each port.</p> + <p>A driver can be configured to have one lock for each port instead.</p> - <p>If a driver is used in a functional way (i.e. it holds no state, + <p>If a driver is used in a functional way (that is, holds no state, but only does some heavy calculation and returns a result), several - ports with registered names can be opened beforehand and the port to - be used can be chosen based on the scheduler ID like this:</p> + ports with registered names can be opened beforehand, and the port to + be used can be chosen based on the scheduler ID as follows:</p> <code type="none"> -define(PORT_NAMES(), @@ -67,82 +66,82 @@ client_port() -> </section> <section> - <title>Avoiding copying of binaries when calling a driver</title> + <title>Avoiding Copying Binaries When Calling a Driver</title> <p>There are basically two ways to avoid copying a binary that is - sent to a driver.</p> - - <p>If the <c>Data</c> argument for - <seealso marker="erts:erlang#port_control/3">port_control/3</seealso> - is a binary, the driver will be passed a pointer to the contents of - the binary and the binary will not be copied. - If the <c>Data</c> argument is an iolist (list of binaries and lists), - all binaries in the iolist will be copied.</p> - - <p>Therefore, if you want to send both a pre-existing binary and some - additional data to a driver without copying the binary, you must call - <c>port_control/3</c> twice; once with the binary and once with the - additional data. However, that will only work if there is only one - process communicating with the port (because otherwise another process - could call the driver in-between the calls).</p> - - <p>Another way to avoid copying binaries is to implement an <c>outputv</c> - callback (instead of an <c>output</c> callback) in the driver. - If a driver has an <c>outputv</c> callback, refc binaries passed - in an iolist in the <c>Data</c> argument for - <seealso marker="erts:erlang#port_command/2">port_command/2</seealso> - will be passed as references to the driver.</p> + sent to a driver:</p> + + <list type="bulleted"> + <item><p>If the <c>Data</c> argument for + <seealso marker="erts:erlang#port_control/3">port_control/3</seealso> + is a binary, the driver will be passed a pointer to the contents of + the binary and the binary will not be copied. If the <c>Data</c> + argument is an iolist (list of binaries and lists), all binaries in + the iolist will be copied.</p> + + <p>Therefore, if you want to send both a pre-existing binary and some + extra data to a driver without copying the binary, you must call + <c>port_control/3</c> twice; once with the binary and once with the + extra data. However, that will only work if there is only one + process communicating with the port (because otherwise another process + can call the driver in-between the calls).</p></item> + + <item><p>Implement an <c>outputv</c> callback (instead of an + <c>output</c> callback) in the driver. If a driver has an + <c>outputv</c> callback, refc binaries passed in an iolist + in the <c>Data</c> argument for + <seealso marker="erts:erlang#port_command/2">port_command/2</seealso> + will be passed as references to the driver.</p></item> + </list> </section> <section> - <title>Returning small binaries from a driver</title> + <title>Returning Small Binaries from a Driver</title> - <p>The run-time system can represent binaries up to 64 bytes as - heap binaries. They will always be copied when sent in a messages, - but they will require less memory if they are not sent to another + <p>The runtime system can represent binaries up to 64 bytes as + heap binaries. They are always copied when sent in messages, + but they require less memory if they are not sent to another process and garbage collection is cheaper.</p> - <p>If you know that the binaries you return are always small, - you should use driver API calls that do not require a pre-allocated - binary, for instance + <p>If you know that the binaries you return are always small, you + are advised to use driver API calls that do not require a pre-allocated + binary, for example, <seealso marker="erts:erl_driver#driver_output">driver_output()</seealso> - or - <seealso marker="erts:erl_driver#erl_drv_output_term">erl_drv_output_term()</seealso> + or + <seealso marker="erts:erl_driver#erl_drv_output_term">erl_drv_output_term()</seealso>, using the <c>ERL_DRV_BUF2BINARY</c> format, - to allow the run-time to construct a heap binary.</p> + to allow the runtime to construct a heap binary.</p> </section> <section> - <title>Returning big binaries without copying from a driver</title> + <title>Returning Large Binaries without Copying from a Driver</title> - <p>To avoid copying data when a big binary is sent or returned from + <p>To avoid copying data when a large binary is sent or returned from the driver to an Erlang process, the driver must first allocate the binary and then send it to an Erlang process in some way.</p> - <p>Use <seealso marker="erts:erl_driver#driver_alloc_binary">driver_alloc_binary()</seealso> to allocate a binary.</p> + <p>Use + <seealso marker="erts:erl_driver#driver_alloc_binary">driver_alloc_binary()</seealso> + to allocate a binary.</p> <p>There are several ways to send a binary created with - <c>driver_alloc_binary()</c>.</p> + <c>driver_alloc_binary()</c>:</p> <list type="bulleted"> - <item><p>From the <c>control</c> callback, a binary can be returned provided - that - <seealso marker="erts:erl_driver#set_port_control_flags">set_port_control_flags()</seealso> - has been called with the flag value <c>PORT_CONTROL_FLAG_BINARY</c>.</p> - </item> + <item>From the <c>control</c> callback, a binary can be returned if + <seealso marker="erts:erl_driver#set_port_control_flags">set_port_control_flags()</seealso> + has been called with the flag value <c>PORT_CONTROL_FLAG_BINARY</c>.</item> - <item><p>A single binary can be sent with - <seealso marker="erts:erl_driver#driver_output_binary">driver_output_binary()</seealso>.</p></item> + <item>A single binary can be sent with + <seealso marker="erts:erl_driver#driver_output_binary">driver_output_binary()</seealso>.</item> - <item><p>Using + <item>Using <seealso marker="erts:erl_driver#erl_drv_output_term">erl_drv_output_term()</seealso> or <seealso marker="erts:erl_driver#erl_drv_send_term">erl_drv_send_term()</seealso>, - a binary can be included in an Erlang term.</p> - </item> + a binary can be included in an Erlang term.</item> </list> </section> - </chapter> diff --git a/system/doc/efficiency_guide/functions.xml b/system/doc/efficiency_guide/functions.xml index ec1a45eaa9..bd23c9d90d 100644 --- a/system/doc/efficiency_guide/functions.xml +++ b/system/doc/efficiency_guide/functions.xml @@ -18,7 +18,6 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> <title>Functions</title> @@ -30,17 +29,18 @@ </header> <section> - <title>Pattern matching</title> - <p>Pattern matching in function head and in <c>case</c> and <c>receive</c> - clauses are optimized by the compiler. With a few exceptions, there is nothing - to gain by rearranging clauses.</p> + <title>Pattern Matching</title> + <p>Pattern matching in function head as well as in <c>case</c> and + <c>receive</c> clauses are optimized by the compiler. With a few + exceptions, there is nothing to gain by rearranging clauses.</p> <p>One exception is pattern matching of binaries. The compiler - will not rearrange clauses that match binaries. Placing the - clause that matches against the empty binary <em>last</em> will usually - be slightly faster than placing it <em>first</em>.</p> + does not rearrange clauses that match binaries. Placing the + clause that matches against the empty binary <em>last</em> is usually + slightly faster than placing it <em>first</em>.</p> - <p>Here is a rather contrived example to show another exception:</p> + <p>The following is a rather unnatural example to show another + exception:</p> <p><em>DO NOT</em></p> <code type="erl"> @@ -53,27 +53,30 @@ atom_map1(five) -> 5; atom_map1(six) -> 6.</code> <p>The problem is the clause with the variable <c>Int</c>. - Since a variable can match anything, including the atoms - <c>four</c>, <c>five</c>, and <c>six</c> that the following clauses - also will match, the compiler must generate sub-optimal code that will - execute as follows:</p> + As a variable can match anything, including the atoms + <c>four</c>, <c>five</c>, and <c>six</c>, which the following clauses + also match, the compiler must generate suboptimal code that + executes as follows:</p> - <p>First the input value is compared to <c>one</c>, <c>two</c>, and + <list type="bulleted"> + <item>First, the input value is compared to <c>one</c>, <c>two</c>, and <c>three</c> (using a single instruction that does a binary search; thus, quite efficient even if there are many values) to select which - one of the first three clauses to execute (if any).</p> + one of the first three clauses to execute (if any).</item> + + <item>>If none of the first three clauses match, the fourth clause + match as a variable always matches.</item> - <p>If none of the first three clauses matched, the fourth clause - will match since a variable always matches. If the guard test - <c>is_integer(Int)</c> succeeds, the fourth clause will be - executed.</p> + <item>If the guard test <c>is_integer(Int)</c> succeeds, the fourth + clause is executed.</item> - <p>If the guard test failed, the input value is compared to + <item>If the guard test fails, the input value is compared to <c>four</c>, <c>five</c>, and <c>six</c>, and the appropriate clause - is selected. (There will be a <c>function_clause</c> exception if - none of the values matched.)</p> + is selected. (There is a <c>function_clause</c> exception if none of + the values matched.)</item> + </list> - <p>Rewriting to either</p> + <p>Rewriting to either:</p> <p><em>DO</em></p> <code type="erl"><![CDATA[ @@ -85,7 +88,7 @@ atom_map2(five) -> 5; atom_map2(six) -> 6; atom_map2(Int) when is_integer(Int) -> Int.]]></code> - <p>or</p> + <p>or:</p> <p><em>DO</em></p> <code type="erl"><![CDATA[ @@ -97,9 +100,9 @@ atom_map3(four) -> 4; atom_map3(five) -> 5; atom_map3(six) -> 6.]]></code> - <p>will give slightly more efficient matching code.</p> + <p>gives slightly more efficient matching code.</p> - <p>Here is a less contrived example:</p> + <p>Another example:</p> <p><em>DO NOT</em></p> <code type="erl"><![CDATA[ @@ -116,7 +119,8 @@ map_pairs1(Map, [X|Xs], [Y|Ys]) -> match anything, the compiler is not allowed to rearrange the clauses, but must generate code that matches them in the order written.</p> - <p>If the function is rewritten like this</p> + <p>If the function is rewritten as follows, the compiler is free to + rearrange the clauses:</p> <p><em>DO</em></p> <code type="erl"><![CDATA[ @@ -127,8 +131,7 @@ map_pairs2(_Map, [_|_]=Xs, [] ) -> map_pairs2(Map, [X|Xs], [Y|Ys]) -> [Map(X, Y)|map_pairs2(Map, Xs, Ys)].]]></code> - <p>the compiler is free to rearrange the clauses. It will generate code - similar to this</p> + <p>The compiler will generate code similar to this:</p> <p><em>DO NOT (already done by the compiler)</em></p> <code type="erl"><![CDATA[ @@ -145,31 +148,35 @@ explicit_map_pairs(Map, Xs0, Ys0) -> Ys0 end.]]></code> - <p>which should be slightly faster for presumably the most common case + <p>This is slightly faster for probably the most common case that the input lists are not empty or very short. - (Another advantage is that Dialyzer is able to deduce a better type - for the variable <c>Xs</c>.)</p> + (Another advantage is that Dialyzer can deduce a better type + for the <c>Xs</c> variable.)</p> </section> <section> - <title>Function Calls </title> + <title>Function Calls</title> - <p>Here is an intentionally rough guide to the relative costs of - different kinds of calls. It is based on benchmark figures run on + <p>This is an intentionally rough guide to the relative costs of + different calls. It is based on benchmark figures run on Solaris/Sparc:</p> <list type="bulleted"> <item>Calls to local or external functions (<c>foo()</c>, <c>m:foo()</c>) - are the fastest kind of calls.</item> + are the fastest calls.</item> + <item>Calling or applying a fun (<c>Fun()</c>, <c>apply(Fun, [])</c>) - is about <em>three times</em> as expensive as calling a local function.</item> + is about <em>three times</em> as expensive as calling a local + function.</item> + <item>Applying an exported function (<c>Mod:Name()</c>, - <c>apply(Mod, Name, [])</c>) is about twice as expensive as calling a fun, - or about <em>six times</em> as expensive as calling a local function.</item> + <c>apply(Mod, Name, [])</c>) is about twice as expensive as calling + a fun or about <em>six times</em> as expensive as calling a local + function.</item> </list> <section> - <title>Notes and implementation details</title> + <title>Notes and Implementation Details</title> <p>Calling and applying a fun does not involve any hash-table lookup. A fun contains an (indirect) pointer to the function that implements @@ -178,42 +185,44 @@ explicit_map_pairs(Map, Xs0, Ys0) -> <warning><p><em>Tuples are not fun(s)</em>. A "tuple fun", <c>{Module,Function}</c>, is not a fun. The cost for calling a "tuple fun" is similar to that - of <c>apply/3</c> or worse. Using "tuple funs" is <em>strongly discouraged</em>, - as they may not be supported in a future release, - and because there exists a superior alternative since the R10B - release, namely the <c>fun Module:Function/Arity</c> syntax.</p></warning> + of <c>apply/3</c> or worse. + Using "tuple funs" is <em>strongly discouraged</em>, + as they might not be supported in a future Erlang/OTP release, + and because there exists a superior alternative from R10B, + namely the <c>fun Module:Function/Arity</c> syntax.</p></warning> <p><c>apply/3</c> must look up the code for the function to execute - in a hash table. Therefore, it will always be slower than a + in a hash table. It is therefore always slower than a direct call or a fun call.</p> <p>It no longer matters (from a performance point of view) - whether you write</p> + whether you write:</p> <code type="erl"> Module:Function(Arg1, Arg2)</code> - <p>or</p> + <p>or:</p> <code type="erl"> apply(Module, Function, [Arg1,Arg2])</code> - <p>(The compiler internally rewrites the latter code into the former.)</p> + <p>The compiler internally rewrites the latter code into the + former.</p> - <p>The following code</p> + <p>The following code is slightly slower because the shape of the + list of arguments is unknown at compile time.</p> <code type="erl"> apply(Module, Function, Arguments)</code> - <p>is slightly slower because the shape of the list of arguments - is not known at compile time.</p> </section> </section> <section> - <title>Memory usage in recursion</title> - <p>When writing recursive functions it is preferable to make them - tail-recursive so that they can execute in constant memory space.</p> + <title>Memory Usage in Recursion</title> + <p>When writing recursive functions, it is preferable to make them + tail-recursive so that they can execute in constant memory space:</p> + <p><em>DO</em></p> <code type="none"> list_length(List) -> @@ -224,13 +233,14 @@ list_length([], AccLen) -> list_length([_|Tail], AccLen) -> list_length(Tail, AccLen + 1). % Tail-recursive</code> + <p><em>DO NOT</em></p> + <code type="none"> list_length([]) -> 0. % Base case list_length([_ | Tail]) -> list_length(Tail) + 1. % Not tail-recursive</code> </section> - </chapter> diff --git a/system/doc/efficiency_guide/introduction.xml b/system/doc/efficiency_guide/introduction.xml index 9726d3ad11..a8360f1cdd 100644 --- a/system/doc/efficiency_guide/introduction.xml +++ b/system/doc/efficiency_guide/introduction.xml @@ -18,7 +18,6 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> <title>Introduction</title> @@ -32,38 +31,39 @@ <section> <title>Purpose</title> - <quote><p>Premature optimization is the root of all evil. -- D.E. Knuth</p></quote> + <quote><p>"Premature optimization is the root of all evil" + (D.E. Knuth)</p></quote> - <p>Efficient code can be well-structured and clean code, based on + <p>Efficient code can be well-structured and clean, based on a sound overall architecture and sound algorithms. Efficient code can be highly implementation-code that bypasses documented interfaces and takes advantage of obscure quirks in the current implementation.</p> - <p>Ideally, your code should only contain the first kind of efficient - code. If that turns out to be too slow, you should profile the application + <p>Ideally, your code only contains the first type of efficient + code. If that turns out to be too slow, profile the application to find out where the performance bottlenecks are and optimize only the - bottlenecks. Other code should stay as clean as possible.</p> + bottlenecks. Let other code stay as clean as possible.</p> - <p>Fortunately, compiler and run-time optimizations introduced in - R12B makes it easier to write code that is both clean and - efficient. For instance, the ugly workarounds needed in R11B and earlier + <p>Fortunately, compiler and runtime optimizations introduced in + Erlang/OTP R12B makes it easier to write code that is both clean and + efficient. For example, the ugly workarounds needed in R11B and earlier releases to get the most speed out of binary pattern matching are no longer necessary. In fact, the ugly code is slower than the clean code (because the clean code has become faster, not because the uglier code has become slower).</p> - <p>This Efficiency Guide cannot really learn you how to write efficient + <p>This Efficiency Guide cannot really teach you how to write efficient code. It can give you a few pointers about what to avoid and what to use, and some understanding of how certain language features are implemented. - We have generally not included general tips about optimization that will - work in any language, such as moving common calculations out of loops.</p> + This guide does not include general tips about optimization that + works in any language, such as moving common calculations out of loops.</p> </section> <section> <title>Prerequisites</title> - <p>It is assumed that the reader is familiar with the Erlang - programming language and concepts of OTP.</p> + <p>It is assumed that you are familiar with the Erlang programming + language and the OTP concepts.</p> </section> </chapter> diff --git a/system/doc/efficiency_guide/listhandling.xml b/system/doc/efficiency_guide/listhandling.xml index 9112738b18..b950f55ad1 100644 --- a/system/doc/efficiency_guide/listhandling.xml +++ b/system/doc/efficiency_guide/listhandling.xml @@ -18,10 +18,9 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> - <title>List handling</title> + <title>List Handling</title> <prepared>Bjorn Gustavsson</prepared> <docno></docno> <date>2007-11-16</date> @@ -30,19 +29,18 @@ </header> <section> - <title>Creating a list</title> + <title>Creating a List</title> - <p>Lists can only be built starting from the end and attaching - list elements at the beginning. If you use the <c>++</c> operator - like this</p> + <p>Lists can only be built starting from the end and attaching list + elements at the beginning. If you use the "<c>++</c>" operator as + follows, a new list is created that is a copy of the elements in + <c>List1</c>, followed by <c>List2</c>:</p> <code type="erl"> List1 ++ List2</code> - <p>you will create a new list which is copy of the elements in <c>List1</c>, - followed by <c>List2</c>. Looking at how <c>lists:append/1</c> or <c>++</c> would be - implemented in plain Erlang, it can be seen clearly that the first list - is copied:</p> + <p>Looking at how <c>lists:append/1</c> or <c>++</c> would be + implemented in plain Erlang, clearly the first list is copied:</p> <code type="erl"> append([H|T], Tail) -> @@ -50,12 +48,12 @@ append([H|T], Tail) -> append([], Tail) -> Tail.</code> - <p>So the important thing when recursing and building a list is to - make sure that you attach the new elements to the beginning of the list, - so that you build <em>a</em> list, and not hundreds or thousands of - copies of the growing result list.</p> + <p>When recursing and building a list, it is important to ensure + that you attach the new elements to the beginning of the list. In + this way, you will build <em>one</em> list, not hundreds or thousands + of copies of the growing result list.</p> - <p>Let us first look at how it should not be done:</p> + <p>Let us first see how it is not to be done:</p> <p><em>DO NOT</em></p> <code type="erl"><![CDATA[ @@ -67,11 +65,11 @@ bad_fib(0, _Current, _Next, Fibs) -> bad_fib(N, Current, Next, Fibs) -> bad_fib(N - 1, Next, Current + Next, Fibs ++ [Current]).]]></code> - <p>Here we are not a building a list; in each iteration step we - create a new list that is one element longer than the new previous list.</p> + <p>Here more than one list is built. In each iteration step a new list + is created that is one element longer than the new previous list.</p> - <p>To avoid copying the result in each iteration, we must build the list in - reverse order and reverse the list when we are done:</p> + <p>To avoid copying the result in each iteration, build the list in + reverse order and reverse the list when you are done:</p> <p><em>DO</em></p> <code type="erl"><![CDATA[ @@ -86,49 +84,45 @@ tail_recursive_fib(N, Current, Next, Fibs) -> </section> <section> - <title>List comprehensions</title> + <title>List Comprehensions</title> <p>Lists comprehensions still have a reputation for being slow. They used to be implemented using funs, which used to be slow.</p> - <p>In recent Erlang/OTP releases (including R12B), a list comprehension</p> + <p>In recent Erlang/OTP releases (including R12B), a list comprehension:</p> <code type="erl"><![CDATA[ [Expr(E) || E <- List]]]></code> - <p>is basically translated to a local function</p> + <p>is basically translated to a local function:</p> <code type="erl"> 'lc^0'([E|Tail], Expr) -> [Expr(E)|'lc^0'(Tail, Expr)]; 'lc^0'([], _Expr) -> [].</code> - <p>In R12B, if the result of the list comprehension will <em>obviously</em> not be used, - a list will not be constructed. For instance, in this code</p> + <p>In R12B, if the result of the list comprehension will <em>obviously</em> + not be used, a list will not be constructed. For example, in this code:</p> <code type="erl"><![CDATA[ [io:put_chars(E) || E <- List], ok.]]></code> - <p>or in this code</p> + <p>or in this code:</p> <code type="erl"><![CDATA[ -. -. -. +... case Var of ... -> [io:put_chars(E) || E <- List]; ... -> end, some_function(...), -. -. -.]]></code> +...]]></code> - <p>the value is neither assigned to a variable, nor passed to another function, - nor returned, so there is no need to construct a list and the compiler will simplify - the code for the list comprehension to</p> + <p>the value is not assigned to a variable, not passed to another function, + and not returned. This means that there is no need to construct a list and + the compiler will simplify the code for the list comprehension to:</p> <code type="erl"> 'lc^0'([E|Tail], Expr) -> @@ -139,14 +133,15 @@ some_function(...), </section> <section> - <title>Deep and flat lists</title> + <title>Deep and Flat Lists</title> <p><seealso marker="stdlib:lists#flatten/1">lists:flatten/1</seealso> - builds an entirely new list. Therefore, it is expensive, and even - <em>more</em> expensive than the <c>++</c> (which copies its left argument, - but not its right argument).</p> + builds an entirely new list. It is therefore expensive, and even + <em>more</em> expensive than the <c>++</c> operator (which copies its + left argument, but not its right argument).</p> - <p>In the following situations, you can easily avoid calling <c>lists:flatten/1</c>:</p> + <p>In the following situations, you can easily avoid calling + <c>lists:flatten/1</c>:</p> <list type="bulleted"> <item>When sending data to a port. Ports understand deep lists @@ -155,16 +150,19 @@ some_function(...), <item>When calling BIFs that accept deep lists, such as <seealso marker="erts:erlang#list_to_binary/1">list_to_binary/1</seealso> or <seealso marker="erts:erlang#iolist_to_binary/1">iolist_to_binary/1</seealso>.</item> - <item>When you know that your list is only one level deep, you can can use + <item>When you know that your list is only one level deep, you can use <seealso marker="stdlib:lists#append/1">lists:append/1</seealso>.</item> </list> - <p><em>Port example</em></p> + <section> + <title>Port Example</title> + <p><em>DO</em></p> <pre> ... port_command(Port, DeepList) ...</pre> + <p><em>DO NOT</em></p> <pre> ... @@ -180,7 +178,7 @@ some_function(...), port_command(Port, TerminatedStr) ...</pre> - <p>Instead do like this:</p> + <p>Instead:</p> <p><em>DO</em></p> <pre> @@ -188,47 +186,53 @@ some_function(...), TerminatedStr = [String, 0], % String="foo" => [[$f, $o, $o], 0] port_command(Port, TerminatedStr) ...</pre> + </section> + + <section> + <title>Append Example</title> - <p><em>Append example</em></p> <p><em>DO</em></p> <pre> > lists:append([[1], [2], [3]]). [1,2,3] ></pre> + <p><em>DO NOT</em></p> <pre> > lists:flatten([[1], [2], [3]]). [1,2,3] ></pre> + </section> </section> <section> - <title>Why you should not worry about recursive lists functions</title> + <title>Recursive List Functions</title> - <p>In the performance myth chapter, the following myth was exposed: - <seealso marker="myths#tail_recursive">Tail-recursive functions - are MUCH faster than recursive functions</seealso>.</p> + <p>In Section 7.2, the following myth was exposed: + <seealso marker="myths#tail_recursive">Tail-Recursive Functions + are Much Faster Than Recursive Functions</seealso>.</p> <p>To summarize, in R12B there is usually not much difference between a body-recursive list function and tail-recursive function that reverses the list at the end. Therefore, concentrate on writing beautiful code - and forget about the performance of your list functions. In the time-critical - parts of your code (and only there), <em>measure</em> before rewriting - your code.</p> - - <p><em>Important note</em>: This section talks about lists functions that - <em>construct</em> lists. A tail-recursive function that does not construct - a list runs in constant space, while the corresponding body-recursive - function uses stack space proportional to the length of the list. - For instance, a function that sums a list of integers, should <em>not</em> be - written like this</p> + and forget about the performance of your list functions. In the + time-critical parts of your code (and only there), <em>measure</em> + before rewriting your code.</p> + + <note><p>This section is about list functions that <em>construct</em> + lists. A tail-recursive function that does not construct a list runs + in constant space, while the corresponding body-recursive function + uses stack space proportional to the length of the list.</p></note> + + <p>For example, a function that sums a list of integers, is + <em>not</em> to be written as follows:</p> <p><em>DO NOT</em></p> <code type="erl"> recursive_sum([H|T]) -> H+recursive_sum(T); recursive_sum([]) -> 0.</code> - <p>but like this</p> + <p>Instead:</p> <p><em>DO</em></p> <code type="erl"> diff --git a/system/doc/efficiency_guide/myths.xml b/system/doc/efficiency_guide/myths.xml index b1108dbab2..70d2dae88e 100644 --- a/system/doc/efficiency_guide/myths.xml +++ b/system/doc/efficiency_guide/myths.xml @@ -31,47 +31,48 @@ <file>myths.xml</file> </header> + <marker id="myths"></marker> <p>Some truths seem to live on well beyond their best-before date, - perhaps because "information" spreads more rapidly from person-to-person - faster than a single release note that notes, for instance, that funs + perhaps because "information" spreads faster from person-to-person + than a single release note that says, for example, that funs have become faster.</p> - <p>Here we try to kill the old truths (or semi-truths) that have + <p>This section tries to kill the old truths (or semi-truths) that have become myths.</p> <section> - <title>Myth: Funs are slow</title> - <p>Yes, funs used to be slow. Very slow. Slower than <c>apply/3</c>. + <title>Myth: Funs are Slow</title> + <p>Funs used to be very slow, slower than <c>apply/3</c>. Originally, funs were implemented using nothing more than compiler trickery, ordinary tuples, <c>apply/3</c>, and a great deal of ingenuity.</p> - <p>But that is ancient history. Funs was given its own data type - in the R6B release and was further optimized in the R7B release. - Now the cost for a fun call falls roughly between the cost for a call to - local function and <c>apply/3</c>.</p> + <p>But that is history. Funs was given its own data type + in R6B and was further optimized in R7B. + Now the cost for a fun call falls roughly between the cost for a call + to a local function and <c>apply/3</c>.</p> </section> <section> - <title>Myth: List comprehensions are slow</title> + <title>Myth: List Comprehensions are Slow</title> <p>List comprehensions used to be implemented using funs, and in the - bad old days funs were really slow.</p> + old days funs were indeed slow.</p> - <p>Nowadays the compiler rewrites list comprehensions into an ordinary - recursive function. Of course, using a tail-recursive function with + <p>Nowadays, the compiler rewrites list comprehensions into an ordinary + recursive function. Using a tail-recursive function with a reverse at the end would be still faster. Or would it? That leads us to the next myth.</p> </section> <section> - <title>Myth: Tail-recursive functions are MUCH faster - than recursive functions</title> + <title>Myth: Tail-Recursive Functions are Much Faster + Than Recursive Functions</title> <p><marker id="tail_recursive"></marker>According to the myth, recursive functions leave references - to dead terms on the stack and the garbage collector will have to - copy all those dead terms, while tail-recursive functions immediately + to dead terms on the stack and the garbage collector has to copy + all those dead terms, while tail-recursive functions immediately discard those terms.</p> <p>That used to be true before R7B. In R7B, the compiler started @@ -79,48 +80,47 @@ be used with an empty list, so that the garbage collector would not keep dead values any longer than necessary.</p> - <p>Even after that optimization, a tail-recursive function would - still most of the time be faster than a body-recursive function. Why?</p> + <p>Even after that optimization, a tail-recursive function is + still most of the times faster than a body-recursive function. Why?</p> <p>It has to do with how many words of stack that are used in each - recursive call. In most cases, a recursive function would use more words + recursive call. In most cases, a recursive function uses more words on the stack for each recursion than the number of words a tail-recursive - would allocate on the heap. Since more memory is used, the garbage - collector will be invoked more frequently, and it will have more work traversing + would allocate on the heap. As more memory is used, the garbage + collector is invoked more frequently, and it has more work traversing the stack.</p> - <p>In R12B and later releases, there is an optimization that will + <p>In R12B and later releases, there is an optimization that in many cases reduces the number of words used on the stack in - body-recursive calls, so that a body-recursive list function and + body-recursive calls. A body-recursive list function and a tail-recursive function that calls <seealso - marker="stdlib:lists#reverse/1">lists:reverse/1</seealso> at the - end will use exactly the same amount of memory. + marker="stdlib:lists#reverse/1">lists:reverse/1</seealso> at + the end will use the same amount of memory. <c>lists:map/2</c>, <c>lists:filter/2</c>, list comprehensions, and many other recursive functions now use the same amount of space as their tail-recursive equivalents.</p> - <p>So which is faster?</p> + <p>So, which is faster? + It depends. On Solaris/Sparc, the body-recursive function seems to + be slightly faster, even for lists with a lot of elements. On the x86 + architecture, tail-recursion was up to about 30% faster.</p> - <p>It depends. On Solaris/Sparc, the body-recursive function seems to - be slightly faster, even for lists with very many elements. On the x86 - architecture, tail-recursion was up to about 30 percent faster.</p> - - <p>So the choice is now mostly a matter of taste. If you really do need + <p>So, the choice is now mostly a matter of taste. If you really do need the utmost speed, you must <em>measure</em>. You can no longer be - absolutely sure that the tail-recursive list function will be the fastest - in all circumstances.</p> + sure that the tail-recursive list function always is the fastest.</p> - <p>Note: A tail-recursive function that does not need to reverse the - list at the end is, of course, faster than a body-recursive function, + <note><p>A tail-recursive function that does not need to reverse the + list at the end is faster than a body-recursive function, as are tail-recursive functions that do not construct any terms at all - (for instance, a function that sums all integers in a list).</p> + (for example, a function that sums all integers in a list).</p></note> </section> <section> - <title>Myth: '++' is always bad</title> + <title>Myth: Operator "++" is Always Bad</title> - <p>The <c>++</c> operator has, somewhat undeservedly, got a very bad reputation. - It probably has something to do with code like</p> + <p>The <c>++</c> operator has, somewhat undeservedly, got a bad reputation. + It probably has something to do with code like the following, + which is the most inefficient way there is to reverse a list:</p> <p><em>DO NOT</em></p> <code type="erl"> @@ -129,12 +129,10 @@ naive_reverse([H|T]) -> naive_reverse([]) -> [].</code> - <p>which is the most inefficient way there is to reverse a list. - Since the <c>++</c> operator copies its left operand, the result - will be copied again and again and again... leading to quadratic - complexity.</p> + <p>As the <c>++</c> operator copies its left operand, the result + is copied repeatedly, leading to quadratic complexity.</p> - <p>On the other hand, using <c>++</c> like this</p> + <p>But using <c>++</c> as follows is not bad:</p> <p><em>OK</em></p> <code type="erl"> @@ -143,11 +141,11 @@ naive_but_ok_reverse([H|T], Acc) -> naive_but_ok_reverse([], Acc) -> Acc.</code> - <p>is not bad. Each list element will only be copied once. + <p>Each list element is copied only once. The growing result <c>Acc</c> is the right operand - for the <c>++</c> operator, and it will <em>not</em> be copied.</p> + for the <c>++</c> operator, and it is <em>not</em> copied.</p> - <p>Of course, experienced Erlang programmers would actually write</p> + <p>Experienced Erlang programmers would write as follows:</p> <p><em>DO</em></p> <code type="erl"> @@ -156,32 +154,34 @@ vanilla_reverse([H|T], Acc) -> vanilla_reverse([], Acc) -> Acc.</code> - <p>which is slightly more efficient because you don't build a - list element only to directly copy it. (Or it would be more efficient - if the the compiler did not automatically rewrite <c>[H]++Acc</c> + <p>This is slightly more efficient because here you do not build a + list element only to copy it directly. (Or it would be more efficient + if the compiler did not automatically rewrite <c>[H]++Acc</c> to <c>[H|Acc]</c>.)</p> </section> <section> - <title>Myth: Strings are slow</title> - - <p>Actually, string handling could be slow if done improperly. - In Erlang, you'll have to think a little more about how the strings - are used and choose an appropriate representation and use - the <seealso marker="stdlib:re">re</seealso> module instead of the obsolete - <c>regexp</c> module if you are going to use regular expressions.</p> + <title>Myth: Strings are Slow</title> + + <p>String handling can be slow if done improperly. + In Erlang, you need to think a little more about how the strings + are used and choose an appropriate representation. If you + use regular expressions, use the + <seealso marker="stdlib:re">re</seealso> module in STDLIB + instead of the obsolete <c>regexp</c> module.</p> </section> <section> - <title>Myth: Repairing a Dets file is very slow</title> + <title>Myth: Repairing a Dets File is Very Slow</title> <p>The repair time is still proportional to the number of records - in the file, but Dets repairs used to be much, much slower in the past. + in the file, but Dets repairs used to be much slower in the past. Dets has been massively rewritten and improved.</p> </section> <section> - <title>Myth: BEAM is a stack-based byte-code virtual machine (and therefore slow)</title> + <title>Myth: BEAM is a Stack-Based Byte-Code Virtual Machine + (and Therefore Slow)</title> <p>BEAM is a register-based virtual machine. It has 1024 virtual registers that are used for holding temporary values and for passing arguments when @@ -193,11 +193,11 @@ vanilla_reverse([], Acc) -> </section> <section> - <title>Myth: Use '_' to speed up your program when a variable is not used</title> + <title>Myth: Use "_" to Speed Up Your Program When a Variable + is Not Used</title> - <p>That was once true, but since R6B the BEAM compiler is quite capable of seeing itself + <p>That was once true, but from R6B the BEAM compiler can see that a variable is not used.</p> </section> - </chapter> diff --git a/system/doc/efficiency_guide/processes.xml b/system/doc/efficiency_guide/processes.xml index 86951e2dcc..3bdc314235 100644 --- a/system/doc/efficiency_guide/processes.xml +++ b/system/doc/efficiency_guide/processes.xml @@ -30,15 +30,15 @@ </header> <section> - <title>Creation of an Erlang process</title> + <title>Creating an Erlang Process</title> - <p>An Erlang process is lightweight compared to operating - systems threads and processes.</p> + <p>An Erlang process is lightweight compared to threads and + processes in operating systems.</p> <p>A newly spawned Erlang process uses 309 words of memory in the non-SMP emulator without HiPE support. (SMP support - and HiPE support will both add to this size.) The size can - be found out like this:</p> + and HiPE support both add to this size.) The size can + be found as follows:</p> <pre> Erlang (BEAM) emulator version 5.6 [async-threads:0] [kernel-poll:false] @@ -51,11 +51,11 @@ Eshell V5.6 (abort with ^G) 3> <input>Bytes div erlang:system_info(wordsize).</input> 309</pre> - <p>The size includes 233 words for the heap area (which includes the stack). - The garbage collector will increase the heap as needed.</p> + <p>The size includes 233 words for the heap area (which includes the + stack). The garbage collector increases the heap as needed.</p> <p>The main (outer) loop for a process <em>must</em> be tail-recursive. - If not, the stack will grow until the process terminates.</p> + Otherwise, the stack grows until the process terminates.</p> <p><em>DO NOT</em></p> <code type="erl"> @@ -74,7 +74,7 @@ loop() -> <p>The call to <c>io:format/2</c> will never be executed, but a return address will still be pushed to the stack each time <c>loop/0</c> is called recursively. The correct tail-recursive - version of the function looks like this:</p> + version of the function looks as follows:</p> <p><em>DO</em></p> <code type="erl"> @@ -90,92 +90,98 @@ loop() -> end.</code> <section> - <title>Initial heap size</title> + <title>Initial Heap Size</title> <p>The default initial heap size of 233 words is quite conservative - in order to support Erlang systems with hundreds of thousands or - even millions of processes. The garbage collector will grow and - shrink the heap as needed.</p> + to support Erlang systems with hundreds of thousands or + even millions of processes. The garbage collector grows and + shrinks the heap as needed.</p> <p>In a system that use comparatively few processes, performance - <em>might</em> be improved by increasing the minimum heap size using either - the <c>+h</c> option for + <em>might</em> be improved by increasing the minimum heap size + using either the <c>+h</c> option for <seealso marker="erts:erl">erl</seealso> or on a process-per-process basis using the <c>min_heap_size</c> option for <seealso marker="erts:erlang#spawn_opt/4">spawn_opt/4</seealso>.</p> - <p>The gain is twofold: Firstly, although the garbage collector will - grow the heap, it will grow it step by step, which will be more - costly than directly establishing a larger heap when the process - is spawned. Secondly, the garbage collector may also shrink the - heap if it is much larger than the amount of data stored on it; - setting the minimum heap size will prevent that.</p> - - <warning><p>The emulator will probably use more memory, and because garbage - collections occur less frequently, huge binaries could be + <p>The gain is twofold:</p> + <list type="bulleted"> + <item>Although the garbage collector grows the heap, it grows it + step-by-step, which is more costly than directly establishing a + larger heap when the process is spawned.</item> + <item>The garbage collector can also shrink the heap if it is + much larger than the amount of data stored on it; + setting the minimum heap size prevents that.</item> + </list> + + <warning><p>The emulator probably uses more memory, and because garbage + collections occur less frequently, huge binaries can be kept much longer.</p></warning> <p>In systems with many processes, computation tasks that run - for a short time could be spawned off into a new process with - a higher minimum heap size. When the process is done, it will - send the result of the computation to another process and terminate. - If the minimum heap size is calculated properly, the process may not - have to do any garbage collections at all. - <em>This optimization should not be attempted + for a short time can be spawned off into a new process with + a higher minimum heap size. When the process is done, it sends + the result of the computation to another process and terminates. + If the minimum heap size is calculated properly, the process might + not have to do any garbage collections at all. + <em>This optimization is not to be attempted without proper measurements.</em></p> </section> - </section> <section> - <title>Process messages</title> + <title>Process Messages</title> - <p>All data in messages between Erlang processes is copied, with - the exception of + <p>All data in messages between Erlang processes is copied, + except for <seealso marker="binaryhandling#refc_binary">refc binaries</seealso> on the same Erlang node.</p> <p>When a message is sent to a process on another Erlang node, - it will first be encoded to the Erlang External Format before - being sent via an TCP/IP socket. The receiving Erlang node decodes - the message and distributes it to the right process.</p> + it is first encoded to the Erlang External Format before + being sent through a TCP/IP socket. The receiving Erlang node decodes + the message and distributes it to the correct process.</p> <section> - <title>The constant pool</title> + <title>Constant Pool</title> <p>Constant Erlang terms (also called <em>literals</em>) are now kept in constant pools; each loaded module has its own pool. - The following function</p> + The following function does no longer build the tuple every time + it is called (only to have it discarded the next time the garbage + collector was run), but the tuple is located in the module's + constant pool:</p> <p><em>DO</em> (in R12B and later)</p> <code type="erl"> days_in_month(M) -> - element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).</code> - - <p>will no longer build the tuple every time it is called (only - to have it discarded the next time the garbage collector was run), but - the tuple will be located in the module's constant pool.</p> + element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).</code> <p>But if a constant is sent to another process (or stored in - an ETS table), it will be <em>copied</em>. - The reason is that the run-time system must be able - to keep track of all references to constants in order to properly - unload code containing constants. (When the code is unloaded, - the constants will be copied to the heap of the processes that refer + an Ets table), it is <em>copied</em>. + The reason is that the runtime system must be able + to keep track of all references to constants to unload code + containing constants properly. (When the code is unloaded, + the constants are copied to the heap of the processes that refer to them.) The copying of constants might be eliminated in a future - release.</p> + Erlang/OTP release.</p> </section> <section> - <title>Loss of sharing</title> + <title>Loss of Sharing</title> - <p>Shared sub-terms are <em>not</em> preserved when a term is sent - to another process, passed as the initial process arguments in - the <c>spawn</c> call, or stored in an ETS table. - That is an optimization. Most applications do not send messages - with shared sub-terms.</p> + <p>Shared subterms are <em>not</em> preserved in the following + cases:</p> + <list type="bulleted"> + <item>When a term is sent to another process</item> + <item>When a term is passed as the initial process arguments in + the <c>spawn</c> call</item> + <item>When a term is stored in an Ets table</item> + </list> + <p>That is an optimization. Most applications do not send messages + with shared subterms.</p> - <p>Here is an example of how a shared sub-term can be created:</p> + <p>The following example shows how a shared subterm can be created:</p> <code type="erl"> kilo_byte() -> @@ -186,32 +192,32 @@ kilo_byte(0, Acc) -> kilo_byte(N, Acc) -> kilo_byte(N-1, [Acc|Acc]).</code> - <p><c>kilo_byte/0</c> creates a deep list. If we call - <c>list_to_binary/1</c>, we can convert the deep list to a binary - of 1024 bytes:</p> + <p><c>kilo_byte/1</c> creates a deep list. + If <c>list_to_binary/1</c> is called, the deep list can be + converted to a binary of 1024 bytes:</p> <pre> 1> <input>byte_size(list_to_binary(efficiency_guide:kilo_byte())).</input> 1024</pre> - <p>Using the <c>erts_debug:size/1</c> BIF we can see that the + <p>Using the <c>erts_debug:size/1</c> BIF, it can be seen that the deep list only requires 22 words of heap space:</p> <pre> 2> <input>erts_debug:size(efficiency_guide:kilo_byte()).</input> 22</pre> - <p>Using the <c>erts_debug:flat_size/1</c> BIF, we can calculate - the size of the deep list if sharing is ignored. It will be + <p>Using the <c>erts_debug:flat_size/1</c> BIF, the size of the + deep list can be calculated if sharing is ignored. It becomes the size of the list when it has been sent to another process - or stored in an ETS table:</p> + or stored in an Ets table:</p> <pre> 3> <input>erts_debug:flat_size(efficiency_guide:kilo_byte()).</input> 4094</pre> - <p>We can verify that sharing will be lost if we insert the - data into an ETS table:</p> + <p>It can be verified that sharing will be lost if the data is + inserted into an Ets table:</p> <pre> 4> <input>T = ets:new(tab, []).</input> @@ -223,21 +229,21 @@ true 7> <input>erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).</input> 4094</pre> - <p>When the data has passed through an ETS table, + <p>When the data has passed through an Ets table, <c>erts_debug:size/1</c> and <c>erts_debug:flat_size/1</c> return the same value. Sharing has been lost.</p> - <p>In a future release of Erlang/OTP, we might implement a - way to (optionally) preserve sharing. We have no plans to make - preserving of sharing the default behaviour, since that would + <p>In a future Erlang/OTP release, it might be implemented a + way to (optionally) preserve sharing. There are no plans to make + preserving of sharing the default behaviour, as that would penalize the vast majority of Erlang applications.</p> </section> </section> <section> - <title>The SMP emulator</title> + <title>SMP Emulator</title> - <p>The SMP emulator (introduced in R11B) will take advantage of a + <p>The SMP emulator (introduced in R11B) takes advantage of a multi-core or multi-CPU computer by running several Erlang scheduler threads (typically, the same as the number of cores). Each scheduler thread schedules Erlang processes in the same way as the Erlang scheduler @@ -247,11 +253,11 @@ true <em>must have more than one runnable Erlang process</em> most of the time. Otherwise, the Erlang emulator can still only run one Erlang process at the time, but you must still pay the overhead for locking. Although - we try to reduce the locking overhead as much as possible, it will never - become exactly zero.</p> + Erlang/OTP tries to reduce the locking overhead as much as possible, + it will never become exactly zero.</p> - <p>Benchmarks that may seem to be concurrent are often sequential. - The estone benchmark, for instance, is entirely sequential. So is also + <p>Benchmarks that appear to be concurrent are often sequential. + The estone benchmark, for example, is entirely sequential. So is the most common implementation of the "ring benchmark"; usually one process is active, while the others wait in a <c>receive</c> statement.</p> @@ -259,6 +265,5 @@ true can be used to profile your application to see how much potential (or lack thereof) it has for concurrency.</p> </section> - </chapter> diff --git a/system/doc/efficiency_guide/profiling.xml b/system/doc/efficiency_guide/profiling.xml index b93c884270..5df12eefe0 100644 --- a/system/doc/efficiency_guide/profiling.xml +++ b/system/doc/efficiency_guide/profiling.xml @@ -30,190 +30,197 @@ </header> <section> - <title>Do not guess about performance - profile</title> + <title>Do Not Guess About Performance - Profile</title> <p>Even experienced software developers often guess wrong about where - the performance bottlenecks are in their programs.</p> - - <p>Therefore, profile your program to see where the performance + the performance bottlenecks are in their programs. Therefore, profile + your program to see where the performance bottlenecks are and concentrate on optimizing them.</p> - <p>Erlang/OTP contains several tools to help finding bottlenecks.</p> + <p>Erlang/OTP contains several tools to help finding bottlenecks:</p> + + <list type="bulleted"> + <item><c>fprof</c> provides the most detailed information about + where the program time is spent, but it significantly slows down the + program it profiles.</item> - <p><c>fprof</c> provide the most detailed information - about where the time is spent, but it significantly slows down the - program it profiles.</p> + <item><p><c>eprof</c> provides time information of each function + used in the program. No call graph is produced, but <c>eprof</c> has + considerable less impact on the program it profiles.</p> + <p>If the program is too large to be profiled by <c>fprof</c> or + <c>eprof</c>, the <c>cover</c> and <c>cprof</c> tools can be used + to locate code parts that are to be more thoroughly profiled using + <c>fprof</c> or <c>eprof</c>.</p></item> - <p><c>eprof</c> provides time information of each function used - in the program. No callgraph is produced but <c>eprof</c> has - considerable less impact on the program profiled.</p> + <item><c>cover</c> provides execution counts per line per + process, with less overhead than <c>fprof</c>. Execution counts + can, with some caution, be used to locate potential performance + bottlenecks.</item> - <p>If the program is too big to be profiled by <c>fprof</c> or <c>eprof</c>, - <c>cover</c> and <c>cprof</c> could be used to locate parts of the - code that should be more thoroughly profiled using <c>fprof</c> or - <c>eprof</c>.</p> + <item><c>cprof</c> is the most lightweight tool, but it only + provides execution counts on a function basis (for all processes, + not per process).</item> + </list> - <p><c>cover</c> provides execution counts per line per process, - with less overhead than <c>fprof</c>. Execution counts can - with some caution be used to locate potential performance bottlenecks. - The most lightweight tool is <c>cprof</c>, but it only provides execution - counts on a function basis (for all processes, not per process).</p> + <p>The tools are further described in + <seealso marker="#profiling_tools">Tools</seealso>.</p> </section> <section> - <title>Big systems</title> - <p>If you have a big system it might be interesting to run profiling + <title>Large Systems</title> + <p>For a large system, it can be interesting to run profiling on a simulated and limited scenario to start with. But bottlenecks - have a tendency to only appear or cause problems when - there are many things going on at the same time, and when there - are many nodes involved. Therefore it is desirable to also run + have a tendency to appear or cause problems only when + many things are going on at the same time, and when + many nodes are involved. Therefore, it is also desirable to run profiling in a system test plant on a real target system.</p> - <p>When your system is big you do not want to run the profiling - tools on the whole system. You want to concentrate on processes - and modules that you know are central and stand for a big part of the - execution.</p> + + <p>For a large system, you do not want to run the profiling + tools on the whole system. Instead you want to concentrate on + central processes and modules, which contribute for a big part + of the execution.</p> </section> <section> - <title>What to look for</title> - <p>When analyzing the result file from the profiling activity - you should look for functions that are called many + <title>What to Look For</title> + <p>When analyzing the result file from the profiling activity, + look for functions that are called many times and have a long "own" execution time (time excluding calls - to other functions). Functions that just are called very - many times can also be interesting, as even small things can add - up to quite a bit if they are repeated often. Then you need to - ask yourself what can I do to reduce this time. Appropriate - types of questions to ask yourself are: </p> + to other functions). Functions that are called a lot of + times can also be interesting, as even small things can add + up to quite a bit if repeated often. Also + ask yourself what you can do to reduce this time. The following + are appropriate types of questions to ask yourself:</p> + <list type="bulleted"> - <item>Can I reduce the number of times the function is called?</item> - <item>Are there tests that can be run less often if I change - the order of tests?</item> - <item>Are there redundant tests that can be removed? </item> - <item>Is there some expression calculated giving the same result - each time? </item> - <item>Are there other ways of doing this that are equivalent and + <item>Is it possible to reduce the number of times the function + is called?</item> + <item>Can any test be run less often if the order of tests is + changed?</item> + <item>Can any redundant tests be removed?</item> + <item>Does any calculated expression give the same result + each time?</item> + <item>Are there other ways to do this that are equivalent and more efficient?</item> - <item>Can I use another internal data representation to make - things more efficient? </item> + <item>Can another internal data representation be used to make + things more efficient?</item> </list> - <p>These questions are not always trivial to answer. You might - need to do some benchmarks to back up your theory, to avoid - making things slower if your theory is wrong. See <seealso marker="#benchmark">benchmarking</seealso>.</p> + + <p>These questions are not always trivial to answer. Some + benchmarks might be needed to back up your theory and to avoid + making things slower if your theory is wrong. For details, see + <seealso marker="#benchmark">Benchmarking</seealso>.</p> </section> <section> <title>Tools</title> - + <marker id="profiling_tools"></marker> <section> <title>fprof</title> - <p> - <c>fprof</c> measures the execution time for each function, - both own time i.e how much time a function has used for its - own execution, and accumulated time i.e. including called - functions. The values are displayed per process. You also get - to know how many times each function has been - called. <c>fprof</c> is based on trace to file in order to - minimize runtime performance impact. Using fprof is just a - matter of calling a few library functions, see - <seealso marker="tools:fprof">fprof</seealso> - manual page under the application tools.<c> fprof</c> was introduced in - version R8 of Erlang/OTP. - </p> + <p><c>fprof</c> measures the execution time for each function, + both own time, that is, how much time a function has used for its + own execution, and accumulated time, that is, including called + functions. The values are displayed per process. You also get + to know how many times each function has been called.</p> + + <p><c>fprof</c> is based on trace to file to minimize runtime + performance impact. Using <c>fprof</c> is just a matter of + calling a few library functions, see the + <seealso marker="tools:fprof">fprof</seealso> manual page in + <c>tools</c> .<c>fprof</c> was introduced in R8.</p> </section> - <section> - <title>eprof</title> - <p> - <c>eprof</c> is based on the Erlang trace_info BIFs. Eprof shows how much time has been used by - each process, and in which function calls this time has been - spent. Time is shown as percentage of total time and absolute time. - See <seealso marker="tools:eprof">eprof</seealso> for - additional information. - </p> - </section> + <section> + <title>eprof</title> + <p><c>eprof</c> is based on the Erlang <c>trace_info</c> BIFs. + <c>eprof</c> shows how much time has been used by each process, + and in which function calls this time has been spent. Time is + shown as percentage of total time and absolute time. For more + information, see the <seealso marker="tools:eprof">eprof</seealso> + manual page in <c>tools</c>.</p> + </section> <section> <title>cover</title> - <p> - <c>cover</c>'s primary use is coverage analysis to verify - test cases, making sure all relevant code is covered. - <c>cover</c> counts how many times each executable line of - code is executed when a program is run. This is done on a per - module basis. Of course this information can be used to - determine what code is run very frequently and could therefore - be subject for optimization. Using cover is just a matter of - calling a few library functions, see - <seealso marker="tools:cover">cover</seealso> - manual page under the application tools.</p> + <p>The primary use of <c>cover</c> is coverage analysis to verify + test cases, making sure that all relevant code is covered. + <c>cover</c> counts how many times each executable line of code + is executed when a program is run, on a per module basis.</p> + <p>Clearly, this information can be used to determine what + code is run very frequently and can therefore be subject for + optimization. Using <c>cover</c> is just a matter of calling a + few library functions, see the + <seealso marker="tools:cover">cover</seealso> manual page in + <c>tools</c>.</p> </section> <section> <title>cprof</title> <p><c>cprof</c> is something in between <c>fprof</c> and - <c>cover</c> regarding features. It counts how many times each - function is called when the program is run, on a per module - basis. <c>cprof</c> has a low performance degradation effect (versus - <c>fprof</c>) and does not need to recompile - any modules to profile (versus <c>cover</c>). - See <seealso marker="tools:cprof">cprof</seealso> manual page for additional - information. - </p> + <c>cover</c> regarding features. It counts how many times each + function is called when the program is run, on a per module + basis. <c>cprof</c> has a low performance degradation effect + (compared with <c>fprof</c>) and does not need to recompile + any modules to profile (compared with <c>cover</c>). + For more information, see the + <seealso marker="tools:cprof">cprof</seealso> manual page in + <c>tools</c>.</p> </section> <section> - <title>Tool summarization</title> + <title>Tool Summary</title> <table> <row> - <cell align="center" valign="middle">Tool</cell> - <cell align="center" valign="middle">Results</cell> - <cell align="center" valign="middle">Size of result</cell> - <cell align="center" valign="middle">Effects on program execution time</cell> - <cell align="center" valign="middle">Records number of calls</cell> - <cell align="center" valign="middle">Records Execution time</cell> - <cell align="center" valign="middle">Records called by</cell> - <cell align="center" valign="middle">Records garbage collection</cell> + <cell><em>Tool</em></cell> + <cell><em>Results</em></cell> + <cell><em>Size of Result</em></cell> + <cell><em>Effects on Program Execution Time</em></cell> + <cell><em>Records Number of Calls</em></cell> + <cell><em>Records Execution Time</em></cell> + <cell><em>Records Called by</em></cell> + <cell><em>Records Garbage Collection</em></cell> </row> <row> - <cell align="left" valign="middle"><c>fprof </c></cell> - <cell align="left" valign="middle">per process to screen/file </cell> - <cell align="left" valign="middle">large </cell> - <cell align="left" valign="middle">significant slowdown </cell> - <cell align="left" valign="middle">yes </cell> - <cell align="left" valign="middle">total and own</cell> - <cell align="left" valign="middle">yes </cell> - <cell align="left" valign="middle">yes </cell> + <cell><c>fprof</c></cell> + <cell>Per process to screen/file</cell> + <cell>Large</cell> + <cell>Significant slowdown</cell> + <cell>Yes</cell> + <cell>Total and own</cell> + <cell>Yes</cell> + <cell>Yes</cell> </row> <row> - <cell align="left" valign="middle"><c>eprof </c></cell> - <cell align="left" valign="middle">per process/function to screen/file </cell> - <cell align="left" valign="middle">medium </cell> - <cell align="left" valign="middle">small slowdown </cell> - <cell align="left" valign="middle">yes </cell> - <cell align="left" valign="middle">only total </cell> - <cell align="left" valign="middle">no </cell> - <cell align="left" valign="middle">no </cell> + <cell><c>eprof</c></cell> + <cell>Per process/function to screen/file</cell> + <cell>Medium</cell> + <cell>Small slowdown</cell> + <cell>Yes</cell> + <cell>Only total</cell> + <cell>No</cell> + <cell>No</cell> </row> <row> - <cell align="left" valign="middle"><c>cover </c></cell> - <cell align="left" valign="middle">per module to screen/file</cell> - <cell align="left" valign="middle">small </cell> - <cell align="left" valign="middle">moderate slowdown</cell> - <cell align="left" valign="middle">yes, per line </cell> - <cell align="left" valign="middle">no </cell> - <cell align="left" valign="middle">no </cell> - <cell align="left" valign="middle">no </cell> + <cell><c>cover</c></cell> + <cell>Per module to screen/file</cell> + <cell>Small</cell> + <cell>Moderate slowdown</cell> + <cell>Yes, per line</cell> + <cell>No</cell> + <cell>No</cell> + <cell>No</cell> </row> <row> - <cell align="left" valign="middle"><c>cprof </c></cell> - <cell align="left" valign="middle">per module to caller</cell> - <cell align="left" valign="middle">small </cell> - <cell align="left" valign="middle">small slowdown </cell> - <cell align="left" valign="middle">yes </cell> - <cell align="left" valign="middle">no </cell> - <cell align="left" valign="middle">no </cell> - <cell align="left" valign="middle">no </cell> + <cell><c>cprof</c></cell> + <cell>Per module to caller</cell> + <cell>Small</cell> + <cell>Small slowdown</cell> + <cell>Yes</cell> + <cell>No</cell> + <cell>No</cell> + <cell>No</cell> </row> - <tcaption></tcaption> + <tcaption>Tool Summary</tcaption> </table> </section> </section> @@ -226,49 +233,51 @@ implementation of a given algorithm or function is the fastest. Benchmarking is far from an exact science. Today's operating systems generally run background tasks that are difficult to turn off. - Caches and multiple CPU cores doesn't make it any easier. - It would be best to run Unix-computers in single-user mode when + Caches and multiple CPU cores does not facilitate benchmarking. + It would be best to run UNIX computers in single-user mode when benchmarking, but that is inconvenient to say the least for casual testing.</p> <p>Benchmarks can measure wall-clock time or CPU time.</p> - <p><seealso marker="stdlib:timer#tc/3">timer:tc/3</seealso> measures + <list type="bulleted"> + <item><seealso marker="stdlib:timer#tc/3">timer:tc/3</seealso> measures wall-clock time. The advantage with wall-clock time is that I/O, - swapping, and other activities in the operating-system kernel are + swapping, and other activities in the operating system kernel are included in the measurements. The disadvantage is that the - the measurements will vary wildly. Usually it is best to run the - benchmark several times and note the shortest time - that time should + measurements vary a lot. Usually it is best to run the + benchmark several times and note the shortest time, which is to be the minimum time that is possible to achieve under the best of - circumstances.</p> + circumstances.</item> - <p><seealso marker="erts:erlang#statistics/1">statistics/1</seealso> - with the argument <c>runtime</c> measures CPU time spent in the Erlang - virtual machine. The advantage is that the results are more + <item><seealso marker="erts:erlang#statistics/1">statistics/1</seealso> + with argument <c>runtime</c> measures CPU time spent in the Erlang + virtual machine. The advantage with CPU time is that the results are more consistent from run to run. The disadvantage is that the time spent in the operating system kernel (such as swapping and I/O) - are not included. Therefore, measuring CPU time is misleading if - any I/O (file or socket) is involved.</p> + is not included. Therefore, measuring CPU time is misleading if + any I/O (file or socket) is involved.</item> + </list> <p>It is probably a good idea to do both wall-clock measurements and CPU time measurements.</p> - <p>Some additional advice:</p> + <p>Some final advice:</p> <list type="bulleted"> - <item>The granularity of both types of measurement could be quite - high so you should make sure that each individual measurement + <item>The granularity of both measurement types can be high. + Therefore, ensure that each individual measurement lasts for at least several seconds.</item> - <item>To make the test fair, each new test run should run in its own, + <item>To make the test fair, each new test run is to run in its own, newly created Erlang process. Otherwise, if all tests run in the - same process, the later tests would start out with larger heap sizes - and therefore probably do less garbage collections. You could - also consider restarting the Erlang emulator between each test.</item> + same process, the later tests start out with larger heap sizes + and therefore probably do fewer garbage collections. + Also consider restarting the Erlang emulator between each test.</item> <item>Do not assume that the fastest implementation of a given algorithm - on computer architecture X also is the fastest on computer architecture Y.</item> - + on computer architecture X is also the fastest on computer architecture + Y.</item> </list> </section> </chapter> diff --git a/system/doc/efficiency_guide/tablesDatabases.xml b/system/doc/efficiency_guide/tablesDatabases.xml index 94c921fa1c..215c2afa1f 100644 --- a/system/doc/efficiency_guide/tablesDatabases.xml +++ b/system/doc/efficiency_guide/tablesDatabases.xml @@ -18,10 +18,9 @@ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. - </legalnotice> - <title>Tables and databases</title> + <title>Tables and Databases</title> <prepared>Ingela Anderton</prepared> <docno></docno> <date>2001-08-07</date> @@ -30,46 +29,45 @@ </header> <section> - <title>Ets, Dets and Mnesia</title> + <title>Ets, Dets, and Mnesia</title> <p>Every example using Ets has a corresponding example in - Mnesia. In general all Ets examples also apply to Dets tables.</p> + Mnesia. In general, all Ets examples also apply to Dets tables.</p> <section> - <title>Select/Match operations</title> - <p>Select/Match operations on Ets and Mnesia tables can become + <title>Select/Match Operations</title> + <p>Select/match operations on Ets and Mnesia tables can become very expensive operations. They usually need to scan the complete - table. You should try to structure your - data so that you minimize the need for select/match - operations. However, if you really need a select/match operation, - it will still be more efficient than using <c>tab2list</c>. - Examples of this and also of ways to avoid select/match will be provided in - some of the following sections. The functions - <c>ets:select/2</c> and <c>mnesia:select/3</c> should be preferred over - <c>ets:match/2</c>,<c>ets:match_object/2</c>, and <c>mnesia:match_object/3</c>.</p> - <note> - <p>There are exceptions when the complete table is not - scanned, for instance if part of the key is bound when searching an - <c>ordered_set</c> table, or if it is a Mnesia - table and there is a secondary index on the field that is - selected/matched. If the key is fully bound there will, of course, be - no point in doing a select/match, unless you have a bag table and - you are only interested in a sub-set of the elements with - the specific key.</p> - </note> - <p>When creating a record to be used in a select/match operation you - want most of the fields to have the value '_'. The easiest and fastest way - to do that is as follows:</p> + table. Try to structure the data to minimize the need for select/match + operations. However, if you require a select/match operation, + it is still more efficient than using <c>tab2list</c>. + Examples of this and of how to avoid select/match are provided in + the following sections. The functions + <c>ets:select/2</c> and <c>mnesia:select/3</c> are to be preferred + over <c>ets:match/2</c>, <c>ets:match_object/2</c>, and + <c>mnesia:match_object/3</c>.</p> + <p>In some circumstances, the select/match operations do not need + to scan the complete table. + For example, if part of the key is bound when searching an + <c>ordered_set</c> table, or if it is a Mnesia + table and there is a secondary index on the field that is + selected/matched. If the key is fully bound, there is + no point in doing a select/match, unless you have a bag table + and are only interested in a subset of the elements with + the specific key.</p> + <p>When creating a record to be used in a select/match operation, you + want most of the fields to have the value "_". The easiest and + fastest way to do that is as follows:</p> <pre> #person{age = 42, _ = '_'}. </pre> </section> <section> - <title>Deleting an element</title> - <p>The delete operation is considered + <title>Deleting an Element</title> + <p>The <c>delete</c> operation is considered successful if the element was not present in the table. Hence all attempts to check that the element is present in the Ets/Mnesia table before deletion are unnecessary. Here follows - an example for Ets tables.</p> + an example for Ets tables:</p> <p><em>DO</em></p> <pre> ... @@ -88,14 +86,16 @@ end, </section> <section> - <title>Data fetching</title> - <p>Do not fetch data that you already have! Consider that you - have a module that handles the abstract data type Person. You - export the interface function <c>print_person/1</c> that uses the internal functions - <c>print_name/1</c>, <c>print_age/1</c>, <c>print_occupation/1</c>.</p> + <title>Fetching Data</title> + <p>Do not fetch data that you already have.</p> + <p>Consider that you have a module that handles the abstract data + type <c>Person</c>. You export the interface function + <c>print_person/1</c>, which uses the internal functions + <c>print_name/1</c>, <c>print_age/1</c>, and + <c>print_occupation/1</c>.</p> <note> - <p>If the functions <c>print_name/1</c> and so on, had been interface - functions the matter comes in to a whole new light, as you + <p>If the function <c>print_name/1</c>, and so on, had been interface + functions, the situation would have been different, as you do not want the user of the interface to know about the internal data representation. </p> </note> @@ -136,7 +136,7 @@ print_person(PersonId) -> io:format("No person with ID = ~p~n", [PersonID]) end. -%%% Internal functions +%%% Internal functionss print_name(PersonID) -> [Person] = ets:lookup(person, PersonId), io:format("No person ~p~n", [Person#person.name]). @@ -151,31 +151,31 @@ print_occupation(PersonID) -> </section> <section> - <title>Non-persistent data storage </title> + <title>Non-Persistent Database Storage</title> <p>For non-persistent database storage, prefer Ets tables over - Mnesia local_content tables. Even the Mnesia <c>dirty_write</c> + Mnesia <c>local_content</c> tables. Even the Mnesia <c>dirty_write</c> operations carry a fixed overhead compared to Ets writes. Mnesia must check if the table is replicated or has indices, this involves at least one Ets lookup for each - <c>dirty_write</c>. Thus, Ets writes will always be faster than + <c>dirty_write</c>. Thus, Ets writes is always faster than Mnesia writes.</p> </section> <section> <title>tab2list</title> - <p>Assume we have an Ets-table, which uses <c>idno</c> as key, - and contains:</p> + <p>Assuming an Ets table that uses <c>idno</c> as key + and contains the following:</p> <pre> [#person{idno = 1, name = "Adam", age = 31, occupation = "mailman"}, #person{idno = 2, name = "Bryan", age = 31, occupation = "cashier"}, #person{idno = 3, name = "Bryan", age = 35, occupation = "banker"}, #person{idno = 4, name = "Carl", age = 25, occupation = "mailman"}]</pre> - <p>If we <em>must</em> return all data stored in the Ets-table we - can use <c>ets:tab2list/1</c>. However, usually we are only + <p>If you <em>must</em> return all data stored in the Ets table, you + can use <c>ets:tab2list/1</c>. However, usually you are only interested in a subset of the information in which case - <c>ets:tab2list/1</c> is expensive. If we only want to extract - one field from each record, e.g., the age of every person, we - should use:</p> + <c>ets:tab2list/1</c> is expensive. If you only want to extract + one field from each record, for example, the age of every person, + then:</p> <p><em>DO</em></p> <pre> ... @@ -192,8 +192,8 @@ ets:select(Tab,[{ #person{idno='_', TabList = ets:tab2list(Tab), lists:map(fun(X) -> X#person.age end, TabList), ...</pre> - <p>If we are only interested in the age of all persons named - Bryan, we should:</p> + <p>If you are only interested in the age of all persons named + "Bryan", then:</p> <p><em>DO</em></p> <pre> ... @@ -224,8 +224,8 @@ BryanList = lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList), lists:map(fun(X) -> X#person.age end, BryanList), ...</pre> - <p>If we need all information stored in the Ets table about - persons named Bryan we should:</p> + <p>If you need all information stored in the Ets table about + persons named "Bryan", then:</p> <p><em>DO</em></p> <pre> ... @@ -243,60 +243,60 @@ lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList), </section> <section> - <title>Ordered_set tables</title> - <p>If the data in the table should be accessed so that the order + <title>Ordered_set Tables</title> + <p>If the data in the table is to be accessed so that the order of the keys in the table is significant, the table type - <c>ordered_set</c> could be used instead of the more usual + <c>ordered_set</c> can be used instead of the more usual <c>set</c> table type. An <c>ordered_set</c> is always - traversed in Erlang term order with regard to the key field - so that return values from functions such as <c>select</c>, + traversed in Erlang term order regarding the key field + so that the return values from functions such as <c>select</c>, <c>match_object</c>, and <c>foldl</c> are ordered by the key values. Traversing an <c>ordered_set</c> with the <c>first</c> and <c>next</c> operations also returns the keys ordered.</p> <note> <p>An <c>ordered_set</c> only guarantees that - objects are processed in <em>key</em> order. Results from functions as - <c>ets:select/2</c> appear in the <em>key</em> order even if + objects are processed in <em>key</em> order. + Results from functions such as + <c>ets:select/2</c> appear in <em>key</em> order even if the key is not included in the result.</p> </note> </section> </section> <section> - <title>Ets specific</title> + <title>Ets-Specific</title> <section> - <title>Utilizing the keys of the Ets table</title> - <p>An Ets table is a single key table (either a hash table or a - tree ordered by the key) and should be used as one. In other + <title>Using Keys of Ets Table</title> + <p>An Ets table is a single-key table (either a hash table or a + tree ordered by the key) and is to be used as one. In other words, use the key to look up things whenever possible. A - lookup by a known key in a set Ets table is constant and for a - ordered_set Ets table it is O(logN). A key lookup is always + lookup by a known key in a <c>set</c> Ets table is constant and for + an <c>ordered_set</c> Ets table it is O(logN). A key lookup is always preferable to a call where the whole table has to be - scanned. In the examples above, the field <c>idno</c> is the + scanned. In the previous examples, the field <c>idno</c> is the key of the table and all lookups where only the name is known - will result in a complete scan of the (possibly large) table + result in a complete scan of the (possibly large) table for a matching result.</p> <p>A simple solution would be to use the <c>name</c> field as the key instead of the <c>idno</c> field, but that would cause - problems if the names were not unique. A more general solution - would be to create a second table with <c>name</c> as key and - <c>idno</c> as data, i.e. to index (invert) the table with regards - to the <c>name</c> field. The second table would of course have to be - kept consistent with the master table. Mnesia could do this - for you, but a home brew index table could be very efficient + problems if the names were not unique. A more general solution would + be to create a second table with <c>name</c> as key and + <c>idno</c> as data, that is, to index (invert) the table regarding + the <c>name</c> field. Clearly, the second table would have to be + kept consistent with the master table. Mnesia can do this + for you, but a home brew index table can be very efficient compared to the overhead involved in using Mnesia.</p> <p>An index table for the table in the previous examples would - have to be a bag (as keys would appear more than once) and could + have to be a bag (as keys would appear more than once) and can have the following contents:</p> <pre> - [#index_entry{name="Adam", idno=1}, #index_entry{name="Bryan", idno=2}, #index_entry{name="Bryan", idno=3}, #index_entry{name="Carl", idno=4}]</pre> - <p>Given this index table a lookup of the <c>age</c> fields for - all persons named "Bryan" could be done like this:</p> + <p>Given this index table, a lookup of the <c>age</c> fields for + all persons named "Bryan" can be done as follows:</p> <pre> ... MatchingIDs = ets:lookup(IndexTable,"Bryan"), @@ -306,30 +306,31 @@ lists:map(fun(#index_entry{idno = ID}) -> end, MatchingIDs), ...</pre> - <p>Note that the code above never uses <c>ets:match/2</c> but - instead utilizes the <c>ets:lookup/2</c> call. The + <p>Notice that this code never uses <c>ets:match/2</c> but + instead uses the <c>ets:lookup/2</c> call. The <c>lists:map/2</c> call is only used to traverse the <c>idno</c>s - matching the name "Bryan" in the table; therefore the number of lookups + matching the name "Bryan" in the table; thus the number of lookups in the master table is minimized.</p> <p>Keeping an index table introduces some overhead when - inserting records in the table, therefore the number of operations - gained from the table has to be weighted against the number of - operations inserting objects in the table. However, note that the gain when - the key can be used to lookup elements is significant.</p> + inserting records in the table. The number of operations gained + from the table must therefore be compared against the number of + operations inserting objects in the table. However, notice that the + gain is significant when the key can be used to lookup elements.</p> </section> </section> <section> - <title>Mnesia specific</title> + <title>Mnesia-Specific</title> <section> - <title>Secondary index</title> + <title>Secondary Index</title> <p>If you frequently do a lookup on a field that is not the - key of the table, you will lose performance using - "mnesia:select/match_object" as this function will traverse the - whole table. You may create a secondary index instead and + key of the table, you lose performance using + "mnesia:select/match_object" as this function traverses the + whole table. You can create a secondary index instead and use "mnesia:index_read" to get faster access, however this - will require more memory. Example:</p> + requires more memory.</p> + <p><em>Example</em></p> <pre> -record(person, {idno, name, age, occupation}). ... @@ -347,14 +348,15 @@ PersonsAge42 = <section> <title>Transactions </title> - <p>Transactions is a way to guarantee that the distributed + <p>Using transactions is a way to guarantee that the distributed Mnesia database remains consistent, even when many different - processes update it in parallel. However if you have - real time requirements it is recommended to use dirty - operations instead of transactions. When using the dirty - operations you lose the consistency guarantee, this is usually + processes update it in parallel. However, if you have + real-time requirements it is recommended to use <c>dirty</c> + operations instead of transactions. When using <c>dirty</c> + operations, you lose the consistency guarantee; this is usually solved by only letting one process update the table. Other - processes have to send update requests to that process.</p> + processes must send update requests to that process.</p> + <p><em>Example</em></p> <pre> ... % Using transaction |