1 files changed, 239 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/commoncaveats.xml b/system/doc/efficiency_guide/commoncaveats.xml
new file mode 100644
index 0000000000..e18e5aa510
--- /dev/null
+++ b/system/doc/efficiency_guide/commoncaveats.xml
@@ -0,0 +1,239 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+  <header>
+    <copyright>
+      <year>2001</year><year>2009</year>
+      <holder>Ericsson AB. All Rights Reserved.</holder>
+    </copyright>
+    <legalnotice>
+      The contents of this file are subject to the Erlang Public License,
+      Version 1.1, (the "License"); you may not use this file except in
+      compliance with the License. You should have received a copy of the
+      Erlang Public License along with this software. If not, it can be
+      retrieved online at http://www.erlang.org/.
+    
+      Software distributed under the License is distributed on an "AS IS"
+      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+      the License for the specific language governing rights and limitations
+      under the License.
+    
+    </legalnotice>
+
+    <title>Common Caveats</title>
+    <prepared>Bjorn Gustavsson</prepared>
+    <docno></docno>
+    <date>2001-08-08</date>
+    <rev></rev>
+    <file>commoncaveats.xml</file>
+  </header>
+
+  <p>Here we list a few modules and BIFs to watch out for, and not only
+  from a performance point of view.</p>
+
+  <section>
+     <title>The regexp module</title>
+
+     <p>The regular expression functions in the
+        <seealso marker="stdlib:regexp">regexp</seealso>
+	module are written in Erlang, not in C, and were
+	meant for occasional use on small amounts of data,
+	for instance for validation of configuration files
+	when starting an application.</p>
+
+      <p>Use the <seealso marker="stdlib:re">re</seealso> module
+      (introduced in R13A) instead, especially in time-critical code.</p>
+  </section>
+
+  <section>
+     <title>The timer module</title>
+
+     <p>Creating timers using <seealso
+     marker="erts:erlang#erlang:send_after/3">erlang:send_after/3</seealso>
+     and <seealso marker="erts:erlang#erlang:start_timer/3">erlang:start_timer/3</seealso>
+     is much more efficient than using the timers provided by the
+     <seealso marker="stdlib:timer">timer</seealso> module. The
+     <c>timer</c> module uses a separate process to manage the timers,
+     and that process can easily become overloaded if many processes
+     create and cancel timers frequently (especially when using the
+     SMP emulator).</p>
+
+     <p>The functions in the <c>timer</c> module that do not manage timers (such as
+     <c>timer:tc/3</c> or <c>timer:sleep/1</c>), do not call the timer-server process
+     and are therefore harmless.</p>
+  </section>
+
+  <section>
+    <title>list_to_atom/1</title>
+
+    <p>Atoms are not garbage-collected. Once an atom is created, it will never
+    be removed. The emulator will terminate if the limit for the number
+    of atoms (1048576) is reached.</p>
+
+    <p>Therefore, converting arbitrary input strings to atoms could be
+    dangerous in a system that will run continuously.
+    If only certain well-defined atoms are allowed as input, you can use
+    <seealso marker="erts:erlang#list_to_existing_atom/1">list_to_existing_atom/1</seealso>
+    to guard against a denial-of-service attack. (All atoms that are allowed
+    must have been created earlier, for instance by simply using all of them
+    in a module and loading that module.)</p>
+
+    <p>Using <c>list_to_atom/1</c> to construct an atom that is passed to
+    <c>apply/3</c> like this</p>
+
+    <code type="erl">
+apply(list_to_atom("some_prefix"++Var), foo, Args)</code>    
+
+    <p>is quite expensive and is not recommended in time-critical code.</p>
+  </section>
+
+  <section>
+    <title>length/1</title>
+
+    <p>The time for calculating the length of a list is proportional to the
+    length of the list, as opposed to <c>tuple_size/1</c>, <c>byte_size/1</c>,
+    and <c>bit_size/1</c>, which all execute in constant time.</p>
+
+    <p>Normally you don't have to worry about the speed of <c>length/1</c>,
+    because it is efficiently implemented in C. In time critical-code, though,
+    you might want to avoid it if the input list could potentially be very long.</p>
+
+    <p>Some uses of <c>length/1</c> can be replaced by matching.
+    For instance, this code</p>
+
+    <code type="erl">
+foo(L) when length(L) >= 3 ->
+    ...</code>    
+
+    <p>can be rewritten to</p>
+    <code type="erl">
+foo([_,_,_|_]=L) ->
+   ...</code>    
+
+   <p>(One slight difference is that <c>length(L)</c> will fail if the <c>L</c>
+   is an improper list, while the pattern in the second code fragment will
+   accept an improper list.)</p>
+  </section>
+
+  <section>
+    <title>setelement/3</title>
+
+    <p><seealso marker="erts:erlang#setelement/3">setelement/3</seealso>
+    copies the tuple it modifies. Therefore, updating a tuple in a loop
+    using <c>setelement/3</c> will create a new copy of the tuple every time.</p>
+    
+    <p>There is one exception to the rule that the tuple is copied.
+    If the compiler clearly can see that destructively updating the tuple would
+    give exactly the same result as if the tuple was copied, the call to
+    <c>setelement/3</c> will be replaced with a special destructive setelement
+    instruction. In the following code sequence</p>
+
+    <code type="erl">
+multiple_setelement(T0) ->
+    T1 = setelement(9, T0, bar),
+    T2 = setelement(7, T1, foobar),
+    setelement(5, T2, new_value).</code>    
+
+    <p>the first <c>setelement/3</c> call will copy the tuple and modify the
+    ninth element. The two following <c>setelement/3</c> calls will modify
+    the tuple in place.</p>
+
+    <p>For the optimization to be applied, <em>all</em> of the followings conditions
+    must be true:</p>
+
+    <list type="bulleted">
+    <item>The indices must be integer literals, not variables or expressions.</item>
+    <item>The indices must be given in descending order.</item>
+    <item>There must be no calls to other function in between the calls to
+    <c>setelement/3</c>.</item>
+    <item>The tuple returned from one <c>setelement/3</c> call must only be used
+    in the subsequent call to <c>setelement/3</c>.</item>
+    </list>
+
+    <p>If it is not possible to structure the code as in the <c>multiple_setelement/1</c>
+    example, the best way to modify multiple elements in a large tuple is to
+    convert the tuple to a list, modify the list, and convert the list back to
+    a tuple.</p>
+  </section>
+
+  <section>
+    <title>size/1</title>
+
+    <p><c>size/1</c> returns the size for both tuples and binary.</p>
+
+    <p>Using the new BIFs <c>tuple_size/1</c> and <c>byte_size/1</c> introduced
+    in R12B gives the compiler and run-time system more opportunities for
+    optimization. A further advantage is that the new BIFs could help Dialyzer
+    find more bugs in your program.</p>
+  </section>
+
+  <section>
+    <title>split_binary/2</title>
+      <p>It is usually more efficient to split a binary using matching
+      instead of calling the <c>split_binary/2</c> function.
+      Furthermore, mixing bit syntax matching and <c>split_binary/2</c>
+      may prevent some optimizations of bit syntax matching.</p>
+
+        <p><em>DO</em></p>
+        <code type="none"><![CDATA[
+        <<Bin1:Num/binary,Bin2/binary>> = Bin,]]></code>
+        <p><em>DO NOT</em></p>
+        <code type="none">
+        {Bin1,Bin2} = split_binary(Bin, Num)
+        </code>
+   </section>
+
+  <section>
+    <title>The '--' operator</title>
+     <p>Note that the '<c>--</c>' operator has a complexity
+     proportional to the product of the length of its operands,
+     meaning that it will be very slow if both of its operands
+     are long lists:</p>
+
+        <p><em>DO NOT</em></p>
+        <code type="none"><![CDATA[
+        HugeList1 -- HugeList2]]></code>
+
+     <p>Instead use the <seealso marker="stdlib:ordsets">ordsets</seealso>
+     module:</p>
+
+        <p><em>DO</em></p>
+        <code type="none">
+        HugeSet1 = ordsets:from_list(HugeList1),
+        HugeSet2 = ordsets:from_list(HugeList2),
+        ordsets:subtract(HugeSet1, HugeSet2)
+        </code>
+
+     <p>Obviously, that code will not work if the original order
+     of the list is important. If the order of the list must be
+     preserved, do like this:</p>
+
+        <p><em>DO</em></p>
+        <code type="none"><![CDATA[
+        Set = gb_sets:from_list(HugeList2),
+        [E || E <- HugeList1, not gb_sets:is_element(E, Set)]]]></code>
+
+     <p>Subtle note 1: This code behaves differently from '<c>--</c>'
+     if the lists contain duplicate elements. (One occurrence
+     of an element in HugeList2 will remove <em>all</em>
+     occurrences in HugeList1.)</p>
+
+     <p>Subtle note 2: This code compares lists elements using the
+     '<c>==</c>' operator, while '<c>--</c>' uses the '<c>=:=</c>'. If
+     that difference is important, <c>sets</c> can be used instead of
+     <c>gb_sets</c>, but note that <c>sets:from_list/1</c> is much
+     slower than <c>gb_sets:from_list/1</c> for long lists.</p>
+
+     <p>Using the '<c>--</c>' operator to delete an element
+     from a list is not a performance problem:</p>
+
+        <p><em>OK</em></p>
+        <code type="none">
+        HugeList1 -- [Element]
+        </code>
+
+   </section>
+
+</chapter>
+