1 files changed, 528 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/binaryhandling.xml b/system/doc/efficiency_guide/binaryhandling.xml
new file mode 100644
index 0000000000..8746de4b60
--- /dev/null
+++ b/system/doc/efficiency_guide/binaryhandling.xml
@@ -0,0 +1,528 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+  <header>
+    <copyright>
+      <year>2007</year>
+      <year>2007</year>
+      <holder>Ericsson AB, All Rights Reserved</holder>
+    </copyright>
+    <legalnotice>
+  The contents of this file are subject to the Erlang Public License,
+  Version 1.1, (the "License"); you may not use this file except in
+  compliance with the License. You should have received a copy of the
+  Erlang Public License along with this software. If not, it can be
+  retrieved online at http://www.erlang.org/.
+
+  Software distributed under the License is distributed on an "AS IS"
+  basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+  the License for the specific language governing rights and limitations
+  under the License.
+
+  The Initial Developer of the Original Code is Ericsson AB.
+    </legalnotice>
+
+    <title>Constructing and matching binaries</title>
+    <prepared>Bjorn Gustavsson</prepared>
+    <docno></docno>
+    <date>2007-10-12</date>
+    <rev></rev>
+    <file>binaryhandling.xml</file>
+  </header>
+
+  <p>In R12B, the most natural way to write binary construction and matching is now
+  significantly faster than in earlier releases.</p>
+
+  <p>To construct at binary, you can simply write</p>
+
+  <p><em>DO</em> (in R12B) / <em>REALLY DO NOT</em> (in earlier releases)</p>
+  <code type="erl"><![CDATA[
+my_list_to_binary(List) ->
+    my_list_to_binary(List, <<>>).
+
+my_list_to_binary([H|T], Acc) ->
+    my_list_to_binary(T, <<Acc/binary,H>>);
+my_list_to_binary([], Acc) ->
+    Acc.]]></code>  
+
+  <p>In releases before R12B, <c>Acc</c> would be copied in every iteration.
+  In R12B, <c>Acc</c> will be copied only in the first iteration and extra
+  space will be allocated at the end of the copied binary. In the next iteration,
+  <c>H</c> will be written in to the extra space. When the extra space runs out,
+  the binary will be reallocated with more extra space.</p>
+
+  <p>The extra space allocated (or reallocated) will be twice the size of the
+  existing binary data, or 256, whichever is larger.</p>
+
+  <p>The most natural way to match binaries is now the fastest:</p>
+
+  <p><em>DO</em> (in R12B)</p>
+  <code type="erl"><![CDATA[
+my_binary_to_list(<<H,T/binary>>) ->
+    [H|my_binary_to_list(T)];
+my_binary_to_list(<<>>) -> [].]]></code>  
+
+  <section>
+    <title>How binaries are implemented</title>
+
+    <p>Internally, binaries and bitstrings are implemented in the same way.
+    In this section, we will call them <em>binaries</em> since that is what
+    they are called in the emulator source code.</p>
+
+    <p>There are four types of binary objects internally. Two of them are
+    containers for binary data and two of them are merely references to
+    a part of a binary.</p>
+
+    <p>The binary containers are called <em>refc binaries</em>
+    (short for <em>reference-counted binaries</em>) and <em>heap binaries</em>.</p>
+
+    <p><marker id="refc_binary"></marker><em>Refc binaries</em>
+    consist of two parts: an object stored on
+    the process heap, called a <em>ProcBin</em>, and the binary object itself
+    stored outside all process heaps.</p>
+
+    <p>The binary object can be referenced by any number of ProcBins from any
+    number of processes; the object contains a reference counter to keep track
+    of the number of references, so that it can be removed when the last
+    reference disappears.</p>
+
+    <p>All ProcBin objects in a process are part of a linked list, so that
+    the garbage collector can keep track of them and decrement the reference
+    counters in the binary when a ProcBin disappears.</p>
+
+    <p><marker id="heap_binary"></marker><em>Heap binaries</em> are small binaries,
+    up to 64 bytes, that are stored directly on the process heap.
+    They will be copied when the process
+    is garbage collected and when they are sent as a message. They don't
+    require any special handling by the garbage collector.</p>
+
+    <p>There are two types of reference objects that can reference part of
+    a refc binary or heap binary. They are called <em>sub binaries</em> and
+    <em>match contexts</em>.</p>
+
+    <p><marker id="sub_binary"></marker>A <em>sub binary</em>
+    is created by <c>split_binary/2</c> and when
+    a binary is matched out in a binary pattern. A sub binary is a reference
+    into a part of another binary (refc or heap binary, never into a another
+    sub binary). Therefore, matching out a binary is relatively cheap because
+    the actual binary data is never copied.</p>
+
+    <p><marker id="match_context"></marker>A <em>match context</em> is
+    similar to a sub binary, but is optimized
+    for binary matching; for instance, it contains a direct pointer to the binary
+    data. For each field that is matched out of a binary, the position in the
+    match context will be incremented.</p>
+
+    <p>In R11B, a match context was only using during a binary matching
+    operation.</p>
+
+    <p>In R12B, the compiler tries to avoid generating code that
+    creates a sub binary, only to shortly afterwards create a new match
+    context and discard the sub binary. Instead of creating a sub binary,
+    the match context is kept.</p>
+
+    <p>The compiler can only do this optimization if it can know for sure
+    that the match context will not be shared. If it would be shared, the
+    functional properties (also called referential transparency) of Erlang
+    would break.</p>
+  </section>
+
+  <section>
+    <title>Constructing binaries</title>
+
+    <p>In R12B, appending to a binary or bitstring</p>
+
+  <code type="erl"><![CDATA[
+<<Binary/binary, ...>>
+<<Binary/bitstring, ...>>]]></code>
+
+    <p>is specially optimized by the <em>run-time system</em>.
+    Because the run-time system handles the optimization (instead of
+    the compiler), there are very few circumstances in which the optimization
+    will not work.</p>
+
+    <p>To explain how it works, we will go through this code</p>
+
+  <code type="erl"><![CDATA[
+Bin0 = <<0>>,                    %% 1
+Bin1 = <<Bin0/binary,1,2,3>>,    %% 2
+Bin2 = <<Bin1/binary,4,5,6>>,    %% 3
+Bin3 = <<Bin2/binary,7,8,9>>,    %% 4
+Bin4 = <<Bin1/binary,17>>,       %% 5 !!!
+{Bin4,Bin3}                      %% 6]]></code>
+
+    <p>line by line.</p>
+
+    <p>The first line (marked with the <c>%% 1</c> comment), assigns
+    a <seealso marker="#heap_binary">heap binary</seealso> to
+    the variable <c>Bin0</c>.</p>
+
+    <p>The second line is an append operation. Since <c>Bin0</c>
+    has not been involved in an append operation,
+    a new <seealso marker="#refc_binary">refc binary</seealso>
+    will be created and the contents of <c>Bin0</c> will be copied
+    into it. The <em>ProcBin</em> part of the refc binary will have
+    its size set to the size of the data stored in the binary, while
+    the binary object will have extra space allocated. 
+    The size of the binary object will be either twice the
+    size of <c>Bin0</c> or 256, whichever is larger. In this case
+    it will be 256.</p>
+
+    <p>It gets more interesting in the third line.
+    <c>Bin1</c> <em>has</em> been used in an append operation,
+    and it has 255 bytes of unused storage at the end, so the three new bytes
+    will be stored there.</p>
+
+    <p>Same thing in the fourth line. There are 252 bytes left,
+    so there is no problem storing another three bytes.</p>
+
+    <p>But in the fifth line something <em>interesting</em> happens.
+    Note that we don't append to the previous result in <c>Bin3</c>,
+    but to <c>Bin1</c>. We expect that <c>Bin4</c> will be assigned
+    the value <c>&lt;&lt;0,1,2,3,17&gt;&gt;</c>. We also expect that
+    <c>Bin3</c> will retain its value
+    (<c>&lt;&lt;0,1,2,3,4,5,6,7,8,9&gt;&gt;</c>).
+    Clearly, the run-time system cannot write the byte <c>17</c> into the binary,
+    because that would change the value of <c>Bin3</c> to
+    <c>&lt;&lt;0,1,2,3,4,17,6,7,8,9&gt;&gt;</c>.</p>
+
+    <p>What will happen?</p>
+
+    <p>The run-time system will see that <c>Bin1</c> is the result
+    from a previous append operation (not from the latest append operation),
+    so it will <em>copy</em> the contents of <c>Bin1</c> to a new binary
+    and reserve extra storage and so on. (We will not explain here how the
+    run-time system can know that it is not allowed to write into <c>Bin1</c>;
+    it is left as an exercise to the curious reader to figure out how it is
+    done by reading the emulator sources, primarily <c>erl_bits.c</c>.)</p>
+
+    <section>
+      <title>Circumstances that force copying</title>
+
+      <p>The optimization of the binary append operation requires that
+      there is a <em>single</em> ProcBin and a <em>single reference</em> to the
+      ProcBin for the binary. The reason is that the binary object can be
+      moved (reallocated) during an append operation, and when that happens
+      the pointer in the ProcBin must be updated. If there would be more than
+      on ProcBin pointing to the binary object, it would not be possible to
+      find and update all of them.</p>
+
+      <p>Therefore, certain operations on a binary will mark it so that
+      any future append operation will be forced to copy the binary.
+      In most cases, the binary object will be shrunk at the same time 
+      to reclaim the extra space allocated for growing.</p>
+
+      <p>When appending to a binary</p>
+      
+  <code type="erl"><![CDATA[
+Bin = <<Bin0,...>>]]></code>
+
+      <p>only the binary returned from the latest append operation will
+      support further cheap append operations. In the code fragment above,
+      appending to <c>Bin</c> will be cheap, while appending to <c>Bin0</c>
+      will force the creation of a new binary and copying of the contents
+      of <c>Bin0</c>.</p>
+
+      <p>If a binary is sent as a message to a process or port, the binary
+      will be shrunk and any further append operation will copy the binary
+      data into a new binary. For instance, in the following code fragment</p>
+
+  <code type="erl"><![CDATA[
+Bin1 = <<Bin0,...>>,
+PortOrPid ! Bin1,
+Bin = <<Bin1,...>>  %% Bin1 will be COPIED
+]]></code>
+
+      <p><c>Bin1</c> will be copied in the third line.</p>
+
+      <p>The same thing happens if you insert a binary into an <em>ets</em>
+      table or send it to a port using <c>erlang:port_command/2</c>.</p>
+
+      <p>Matching a binary will also cause it to shrink and the next append
+      operation will copy the binary data:</p>
+
+  <code type="erl"><![CDATA[
+Bin1 = <<Bin0,...>>,
+<<X,Y,Z,T/binary>> = Bin1,
+Bin = <<Bin1,...>>  %% Bin1 will be COPIED
+]]></code>
+
+      <p>The reason is that a <seealso marker="#match_context">match context</seealso>
+      contains a direct pointer to the binary data.</p>
+
+      <p>If a process simply keeps binaries (either in "loop data" or in the process
+      dictionary), the garbage collector may eventually shrink the binaries.
+      If only one such binary is kept, it will not be shrunk. If the process later
+      appends to a binary that has been shrunk, the binary object will be reallocated
+      to make place for the data to be appended.</p>
+    </section>
+
+  </section>
+
+  <section>
+    <title>Matching binaries</title>
+
+    <p>We will revisit the example shown earlier</p>
+
+  <p><em>DO</em> (in R12B)</p>
+  <code type="erl"><![CDATA[
+my_binary_to_list(<<H,T/binary>>) ->
+    [H|my_binary_to_list(T)];
+my_binary_to_list(<<>>) -> [].]]></code>  
+
+  <p>too see what is happening under the hood.</p>
+
+  <p>The very first time <c>my_binary_to_list/1</c> is called,
+  a <seealso marker="#match_context">match context</seealso>
+  will be created. The match context will point to the first
+  byte of the binary. One byte will be matched out and the match context
+  will be updated to point to the second byte in the binary.</p>
+
+  <p>In R11B, at this point a <seealso marker="#sub_binary">sub binary</seealso>
+  would be created. In R12B,
+  the compiler sees that there is no point in creating a sub binary,
+  because there will soon be a call to a function (in this case,
+  to <c>my_binary_to_list/1</c> itself) that will immediately
+  create a new match context and discard the sub binary.</p>
+
+  <p>Therefore, in R12B, <c>my_binary_to_list/1</c> will call itself
+  with the match context instead of with a sub binary. The instruction
+  that initializes the matching operation will basically do nothing
+  when it sees that it was passed a match context instead of a binary.</p>
+
+  <p>When the end of the binary is reached and second clause matches,
+  the match context will simply be discarded (removed in the next
+  garbage collection, since there is no longer any reference to it).</p>
+
+  <p>To summarize, <c>my_binary_to_list/1</c> in R12B only needs to create
+  <em>one</em> match context and no sub binaries. In R11B, if the binary
+  contains <em>N</em> bytes, <em>N+1</em> match contexts and <em>N</em>
+  sub binaries will be created.</p>
+
+  <p>In R11B, the fastest way to match binaries is:</p>
+
+  <p><em>DO NOT</em> (in R12B)</p>
+  <code type="erl"><![CDATA[
+my_complicated_binary_to_list(Bin) ->
+    my_complicated_binary_to_list(Bin, 0).
+
+my_complicated_binary_to_list(Bin, Skip) ->
+    case Bin of
+	<<_:Skip/binary,Byte,_/binary>> ->
+	    [Byte|my_complicated_binary_to_list(Bin, Skip+1)];
+	<<_:Skip/binary>> ->
+	    []
+    end.]]></code>
+
+  <p>This function cleverly avoids building sub binaries, but it cannot
+  avoid building a match context in each recursion step. Therefore, in both R11B and R12B,
+  <c>my_complicated_binary_to_list/1</c> builds <em>N+1</em> match
+  contexts. (In a future release, the compiler might be able to generate code
+  that reuses the match context, but don't hold your breath.)</p>
+
+  <p>Returning to <c>my_binary_to_list/1</c>, note that the match context was
+  discarded when the entire binary had been traversed. What happens if
+  the iteration stops before it has reached the end of the binary? Will
+  the optimization still work?</p>
+
+  <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+    T;
+after_zero(<<_,T/binary>>) ->
+    after_zero(T);
+after_zero(<<>>) ->
+    <<>>.
+  ]]></code>
+
+  <p>Yes, it will. The compiler will remove the building of the sub binary in the
+  second clause</p>
+
+  <code type="erl"><![CDATA[
+.
+.
+.
+after_zero(<<_,T/binary>>) ->
+    after_zero(T);
+.
+.
+.]]></code>
+
+  <p>but will generate code that builds a sub binary in the first clause</p>
+
+  <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+    T;
+.
+.
+.]]></code>
+
+  <p>Therefore, <c>after_zero/1</c> will build one match context and one sub binary
+  (assuming it is passed a binary that contains a zero byte).</p>
+
+  <p>Code like the following will also be optimized:</p>
+
+  <code type="erl"><![CDATA[
+all_but_zeroes_to_list(Buffer, Acc, 0) ->
+    {lists:reverse(Acc),Buffer};
+all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
+    all_but_zeroes_to_list(T, Acc, Remaining-1);
+all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
+    all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]></code>
+
+  <p>The compiler will remove building of sub binaries in the second and third clauses,
+  and it will add an instruction to the first clause that will convert <c>Buffer</c>
+  from a match context to a sub binary (or do nothing if <c>Buffer</c> already is a binary).</p>
+
+  <p>Before you begin to think that the compiler can optimize any binary patterns,
+  here is a function that the compiler (currently, at least) is not able to optimize:</p>
+
+  <code type="erl"><![CDATA[
+non_opt_eq([H|T1], <<H,T2/binary>>) ->
+    non_opt_eq(T1, T2);
+non_opt_eq([_|_], <<_,_/binary>>) ->
+    false;
+non_opt_eq([], <<>>) ->
+    true.]]></code>
+
+  <p>It was briefly mentioned earlier that the compiler can only delay creation of
+  sub binaries if it can be sure that the binary will not be shared. In this case,
+  the compiler cannot be sure.</p>
+
+  <p>We will soon show how to rewrite <c>non_opt_eq/2</c> so that the delayed sub binary
+  optimization can be applied, and more importantly, we will show how you can find out
+  whether your code can be optimized.</p>
+
+  <section>
+    <title>The bin_opt_info option</title>
+
+    <p>Use the <c>bin_opt_info</c> option to have the compiler print a lot of 
+    information about binary optimizations. It can be given either to the compiler or
+    <c>erlc</c></p>
+
+  <code type="erl"><![CDATA[
+erlc +bin_opt_info Mod.erl]]></code>
+
+    <p>or passed via an environment variable</p>
+
+  <code type="erl"><![CDATA[
+export ERL_COMPILER_OPTIONS=bin_opt_info]]></code>
+
+    <p>Note that the <c>bin_opt_info</c> is not meant to be a permanent option added
+    to your <c>Makefile</c>s, because it is not possible to eliminate all messages that
+    it generates. Therefore, passing the option through the environment is in most cases
+    the most practical approach.</p>
+
+    <p>The warnings will look like this:</p>
+
+  <code type="erl"><![CDATA[
+./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned
+./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed]]></code>
+
+    <p>To make it clearer exactly what code the warnings refer to,
+    in the examples that follow, the warnings are inserted as comments
+    after the clause they refer to:</p>
+
+  <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+         %% NOT OPTIMIZED: sub binary is used or returned
+    T;
+after_zero(<<_,T/binary>>) ->
+         %% OPTIMIZED: creation of sub binary delayed
+    after_zero(T);
+after_zero(<<>>) ->
+    <<>>.]]></code>
+
+    <p>The warning for the first clause tells us that it is not possible to
+    delay the creation of a sub binary, because it will be returned.
+    The warning for the second clause tells us that a sub binary will not be
+    created (yet).</p>
+
+    <p>It is time to revisit the earlier example of the code that could not
+    be optimized and find out why:</p>
+
+  <code type="erl"><![CDATA[
+non_opt_eq([H|T1], <<H,T2/binary>>) ->
+        %% INFO: matching anything else but a plain variable to
+	%%    the left of binary pattern will prevent delayed 
+	%%    sub binary optimization;
+	%%    SUGGEST changing argument order
+        %% NOT OPTIMIZED: called function non_opt_eq/2 does not
+	%%    begin with a suitable binary matching instruction
+    non_opt_eq(T1, T2);
+non_opt_eq([_|_], <<_,_/binary>>) ->
+    false;
+non_opt_eq([], <<>>) ->
+    true.]]></code>
+
+    <p>The compiler emitted two warnings. The <c>INFO</c> warning refers to the function
+    <c>non_opt_eq/2</c> as a callee, indicating that any functions that call <c>non_opt_eq/2</c>
+    will not be able to make delayed sub binary optimization.
+    There is also a suggestion to change argument order.
+    The second warning (that happens to refer to the same line) refers to the construction of
+    the sub binary itself.</p>
+    
+    <p>We will soon show another example that should make the distinction between <c>INFO</c>
+    and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but first we will heed the suggestion
+    to change argument order:</p>
+
+  <code type="erl"><![CDATA[
+opt_eq(<<H,T1/binary>>, [H|T2]) ->
+        %% OPTIMIZED: creation of sub binary delayed
+    opt_eq(T1, T2);
+opt_eq(<<_,_/binary>>, [_|_]) ->
+    false;
+opt_eq(<<>>, []) ->
+    true.]]></code>
+
+    <p>The compiler gives a warning for the following code fragment:</p>
+
+  <code type="erl"><![CDATA[
+match_body([0|_], <<H,_/binary>>) ->
+        %% INFO: matching anything else but a plain variable to
+	%%    the left of binary pattern will prevent delayed 
+	%%    sub binary optimization;
+	%%    SUGGEST changing argument order
+    done;
+.
+.
+.]]></code>
+
+    <p>The warning means that <em>if</em> there is a call to <c>match_body/2</c>
+    (from another clause in <c>match_body/2</c> or another function), the
+    delayed sub binary optimization will not be possible. There will be additional
+    warnings for any place where a sub binary is matched out at the end of and
+    passed as the second argument to <c>match_body/2</c>. For instance:</p>
+
+  <code type="erl"><![CDATA[
+match_head(List, <<_:10,Data/binary>>) ->
+        %% NOT OPTIMIZED: called function match_body/2 does not
+	%%     begin with a suitable binary matching instruction
+    match_body(List, Data).]]></code>
+
+  </section>
+
+  <section>
+    <title>Unused variables</title>
+
+    <p>The compiler itself figures out if a variable is unused. The same
+    code is generated for each of the following functions</p>
+
+  <code type="erl"><![CDATA[
+count1(<<_,T/binary>>, Count) -> count1(T, Count+1);
+count1(<<>>, Count) -> Count.
+
+count2(<<H,T/binary>>, Count) -> count2(T, Count+1);
+count2(<<>>, Count) -> Count.
+
+count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
+count3(<<>>, Count) -> Count.]]></code>
+
+  <p>In each iteration, the first 8 bits in the binary will be skipped, not matched out.</p>
+
+  </section>
+
+  </section>
+
+</chapter>
+