aboutsummaryrefslogtreecommitdiffstats
path: root/system/doc/efficiency_guide/binaryhandling.xml
diff options
context:
space:
mode:
authorErlang/OTP <[email protected]>2009-11-20 14:54:40 +0000
committerErlang/OTP <[email protected]>2009-11-20 14:54:40 +0000
commit84adefa331c4159d432d22840663c38f155cd4c1 (patch)
treebff9a9c66adda4df2106dfd0e5c053ab182a12bd /system/doc/efficiency_guide/binaryhandling.xml
downloadotp-84adefa331c4159d432d22840663c38f155cd4c1.tar.gz
otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.bz2
otp-84adefa331c4159d432d22840663c38f155cd4c1.zip
The R13B03 release.OTP_R13B03
Diffstat (limited to 'system/doc/efficiency_guide/binaryhandling.xml')
-rw-r--r--system/doc/efficiency_guide/binaryhandling.xml528
1 files changed, 528 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/binaryhandling.xml b/system/doc/efficiency_guide/binaryhandling.xml
new file mode 100644
index 0000000000..8746de4b60
--- /dev/null
+++ b/system/doc/efficiency_guide/binaryhandling.xml
@@ -0,0 +1,528 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+ <header>
+ <copyright>
+ <year>2007</year>
+ <year>2007</year>
+ <holder>Ericsson AB, All Rights Reserved</holder>
+ </copyright>
+ <legalnotice>
+ The contents of this file are subject to the Erlang Public License,
+ Version 1.1, (the "License"); you may not use this file except in
+ compliance with the License. You should have received a copy of the
+ Erlang Public License along with this software. If not, it can be
+ retrieved online at http://www.erlang.org/.
+
+ Software distributed under the License is distributed on an "AS IS"
+ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+ the License for the specific language governing rights and limitations
+ under the License.
+
+ The Initial Developer of the Original Code is Ericsson AB.
+ </legalnotice>
+
+ <title>Constructing and matching binaries</title>
+ <prepared>Bjorn Gustavsson</prepared>
+ <docno></docno>
+ <date>2007-10-12</date>
+ <rev></rev>
+ <file>binaryhandling.xml</file>
+ </header>
+
+ <p>In R12B, the most natural way to write binary construction and matching is now
+ significantly faster than in earlier releases.</p>
+
+ <p>To construct at binary, you can simply write</p>
+
+ <p><em>DO</em> (in R12B) / <em>REALLY DO NOT</em> (in earlier releases)</p>
+ <code type="erl"><![CDATA[
+my_list_to_binary(List) ->
+ my_list_to_binary(List, <<>>).
+
+my_list_to_binary([H|T], Acc) ->
+ my_list_to_binary(T, <<Acc/binary,H>>);
+my_list_to_binary([], Acc) ->
+ Acc.]]></code>
+
+ <p>In releases before R12B, <c>Acc</c> would be copied in every iteration.
+ In R12B, <c>Acc</c> will be copied only in the first iteration and extra
+ space will be allocated at the end of the copied binary. In the next iteration,
+ <c>H</c> will be written in to the extra space. When the extra space runs out,
+ the binary will be reallocated with more extra space.</p>
+
+ <p>The extra space allocated (or reallocated) will be twice the size of the
+ existing binary data, or 256, whichever is larger.</p>
+
+ <p>The most natural way to match binaries is now the fastest:</p>
+
+ <p><em>DO</em> (in R12B)</p>
+ <code type="erl"><![CDATA[
+my_binary_to_list(<<H,T/binary>>) ->
+ [H|my_binary_to_list(T)];
+my_binary_to_list(<<>>) -> [].]]></code>
+
+ <section>
+ <title>How binaries are implemented</title>
+
+ <p>Internally, binaries and bitstrings are implemented in the same way.
+ In this section, we will call them <em>binaries</em> since that is what
+ they are called in the emulator source code.</p>
+
+ <p>There are four types of binary objects internally. Two of them are
+ containers for binary data and two of them are merely references to
+ a part of a binary.</p>
+
+ <p>The binary containers are called <em>refc binaries</em>
+ (short for <em>reference-counted binaries</em>) and <em>heap binaries</em>.</p>
+
+ <p><marker id="refc_binary"></marker><em>Refc binaries</em>
+ consist of two parts: an object stored on
+ the process heap, called a <em>ProcBin</em>, and the binary object itself
+ stored outside all process heaps.</p>
+
+ <p>The binary object can be referenced by any number of ProcBins from any
+ number of processes; the object contains a reference counter to keep track
+ of the number of references, so that it can be removed when the last
+ reference disappears.</p>
+
+ <p>All ProcBin objects in a process are part of a linked list, so that
+ the garbage collector can keep track of them and decrement the reference
+ counters in the binary when a ProcBin disappears.</p>
+
+ <p><marker id="heap_binary"></marker><em>Heap binaries</em> are small binaries,
+ up to 64 bytes, that are stored directly on the process heap.
+ They will be copied when the process
+ is garbage collected and when they are sent as a message. They don't
+ require any special handling by the garbage collector.</p>
+
+ <p>There are two types of reference objects that can reference part of
+ a refc binary or heap binary. They are called <em>sub binaries</em> and
+ <em>match contexts</em>.</p>
+
+ <p><marker id="sub_binary"></marker>A <em>sub binary</em>
+ is created by <c>split_binary/2</c> and when
+ a binary is matched out in a binary pattern. A sub binary is a reference
+ into a part of another binary (refc or heap binary, never into a another
+ sub binary). Therefore, matching out a binary is relatively cheap because
+ the actual binary data is never copied.</p>
+
+ <p><marker id="match_context"></marker>A <em>match context</em> is
+ similar to a sub binary, but is optimized
+ for binary matching; for instance, it contains a direct pointer to the binary
+ data. For each field that is matched out of a binary, the position in the
+ match context will be incremented.</p>
+
+ <p>In R11B, a match context was only using during a binary matching
+ operation.</p>
+
+ <p>In R12B, the compiler tries to avoid generating code that
+ creates a sub binary, only to shortly afterwards create a new match
+ context and discard the sub binary. Instead of creating a sub binary,
+ the match context is kept.</p>
+
+ <p>The compiler can only do this optimization if it can know for sure
+ that the match context will not be shared. If it would be shared, the
+ functional properties (also called referential transparency) of Erlang
+ would break.</p>
+ </section>
+
+ <section>
+ <title>Constructing binaries</title>
+
+ <p>In R12B, appending to a binary or bitstring</p>
+
+ <code type="erl"><![CDATA[
+<<Binary/binary, ...>>
+<<Binary/bitstring, ...>>]]></code>
+
+ <p>is specially optimized by the <em>run-time system</em>.
+ Because the run-time system handles the optimization (instead of
+ the compiler), there are very few circumstances in which the optimization
+ will not work.</p>
+
+ <p>To explain how it works, we will go through this code</p>
+
+ <code type="erl"><![CDATA[
+Bin0 = <<0>>, %% 1
+Bin1 = <<Bin0/binary,1,2,3>>, %% 2
+Bin2 = <<Bin1/binary,4,5,6>>, %% 3
+Bin3 = <<Bin2/binary,7,8,9>>, %% 4
+Bin4 = <<Bin1/binary,17>>, %% 5 !!!
+{Bin4,Bin3} %% 6]]></code>
+
+ <p>line by line.</p>
+
+ <p>The first line (marked with the <c>%% 1</c> comment), assigns
+ a <seealso marker="#heap_binary">heap binary</seealso> to
+ the variable <c>Bin0</c>.</p>
+
+ <p>The second line is an append operation. Since <c>Bin0</c>
+ has not been involved in an append operation,
+ a new <seealso marker="#refc_binary">refc binary</seealso>
+ will be created and the contents of <c>Bin0</c> will be copied
+ into it. The <em>ProcBin</em> part of the refc binary will have
+ its size set to the size of the data stored in the binary, while
+ the binary object will have extra space allocated.
+ The size of the binary object will be either twice the
+ size of <c>Bin0</c> or 256, whichever is larger. In this case
+ it will be 256.</p>
+
+ <p>It gets more interesting in the third line.
+ <c>Bin1</c> <em>has</em> been used in an append operation,
+ and it has 255 bytes of unused storage at the end, so the three new bytes
+ will be stored there.</p>
+
+ <p>Same thing in the fourth line. There are 252 bytes left,
+ so there is no problem storing another three bytes.</p>
+
+ <p>But in the fifth line something <em>interesting</em> happens.
+ Note that we don't append to the previous result in <c>Bin3</c>,
+ but to <c>Bin1</c>. We expect that <c>Bin4</c> will be assigned
+ the value <c>&lt;&lt;0,1,2,3,17&gt;&gt;</c>. We also expect that
+ <c>Bin3</c> will retain its value
+ (<c>&lt;&lt;0,1,2,3,4,5,6,7,8,9&gt;&gt;</c>).
+ Clearly, the run-time system cannot write the byte <c>17</c> into the binary,
+ because that would change the value of <c>Bin3</c> to
+ <c>&lt;&lt;0,1,2,3,4,17,6,7,8,9&gt;&gt;</c>.</p>
+
+ <p>What will happen?</p>
+
+ <p>The run-time system will see that <c>Bin1</c> is the result
+ from a previous append operation (not from the latest append operation),
+ so it will <em>copy</em> the contents of <c>Bin1</c> to a new binary
+ and reserve extra storage and so on. (We will not explain here how the
+ run-time system can know that it is not allowed to write into <c>Bin1</c>;
+ it is left as an exercise to the curious reader to figure out how it is
+ done by reading the emulator sources, primarily <c>erl_bits.c</c>.)</p>
+
+ <section>
+ <title>Circumstances that force copying</title>
+
+ <p>The optimization of the binary append operation requires that
+ there is a <em>single</em> ProcBin and a <em>single reference</em> to the
+ ProcBin for the binary. The reason is that the binary object can be
+ moved (reallocated) during an append operation, and when that happens
+ the pointer in the ProcBin must be updated. If there would be more than
+ on ProcBin pointing to the binary object, it would not be possible to
+ find and update all of them.</p>
+
+ <p>Therefore, certain operations on a binary will mark it so that
+ any future append operation will be forced to copy the binary.
+ In most cases, the binary object will be shrunk at the same time
+ to reclaim the extra space allocated for growing.</p>
+
+ <p>When appending to a binary</p>
+
+ <code type="erl"><![CDATA[
+Bin = <<Bin0,...>>]]></code>
+
+ <p>only the binary returned from the latest append operation will
+ support further cheap append operations. In the code fragment above,
+ appending to <c>Bin</c> will be cheap, while appending to <c>Bin0</c>
+ will force the creation of a new binary and copying of the contents
+ of <c>Bin0</c>.</p>
+
+ <p>If a binary is sent as a message to a process or port, the binary
+ will be shrunk and any further append operation will copy the binary
+ data into a new binary. For instance, in the following code fragment</p>
+
+ <code type="erl"><![CDATA[
+Bin1 = <<Bin0,...>>,
+PortOrPid ! Bin1,
+Bin = <<Bin1,...>> %% Bin1 will be COPIED
+]]></code>
+
+ <p><c>Bin1</c> will be copied in the third line.</p>
+
+ <p>The same thing happens if you insert a binary into an <em>ets</em>
+ table or send it to a port using <c>erlang:port_command/2</c>.</p>
+
+ <p>Matching a binary will also cause it to shrink and the next append
+ operation will copy the binary data:</p>
+
+ <code type="erl"><![CDATA[
+Bin1 = <<Bin0,...>>,
+<<X,Y,Z,T/binary>> = Bin1,
+Bin = <<Bin1,...>> %% Bin1 will be COPIED
+]]></code>
+
+ <p>The reason is that a <seealso marker="#match_context">match context</seealso>
+ contains a direct pointer to the binary data.</p>
+
+ <p>If a process simply keeps binaries (either in "loop data" or in the process
+ dictionary), the garbage collector may eventually shrink the binaries.
+ If only one such binary is kept, it will not be shrunk. If the process later
+ appends to a binary that has been shrunk, the binary object will be reallocated
+ to make place for the data to be appended.</p>
+ </section>
+
+ </section>
+
+ <section>
+ <title>Matching binaries</title>
+
+ <p>We will revisit the example shown earlier</p>
+
+ <p><em>DO</em> (in R12B)</p>
+ <code type="erl"><![CDATA[
+my_binary_to_list(<<H,T/binary>>) ->
+ [H|my_binary_to_list(T)];
+my_binary_to_list(<<>>) -> [].]]></code>
+
+ <p>too see what is happening under the hood.</p>
+
+ <p>The very first time <c>my_binary_to_list/1</c> is called,
+ a <seealso marker="#match_context">match context</seealso>
+ will be created. The match context will point to the first
+ byte of the binary. One byte will be matched out and the match context
+ will be updated to point to the second byte in the binary.</p>
+
+ <p>In R11B, at this point a <seealso marker="#sub_binary">sub binary</seealso>
+ would be created. In R12B,
+ the compiler sees that there is no point in creating a sub binary,
+ because there will soon be a call to a function (in this case,
+ to <c>my_binary_to_list/1</c> itself) that will immediately
+ create a new match context and discard the sub binary.</p>
+
+ <p>Therefore, in R12B, <c>my_binary_to_list/1</c> will call itself
+ with the match context instead of with a sub binary. The instruction
+ that initializes the matching operation will basically do nothing
+ when it sees that it was passed a match context instead of a binary.</p>
+
+ <p>When the end of the binary is reached and second clause matches,
+ the match context will simply be discarded (removed in the next
+ garbage collection, since there is no longer any reference to it).</p>
+
+ <p>To summarize, <c>my_binary_to_list/1</c> in R12B only needs to create
+ <em>one</em> match context and no sub binaries. In R11B, if the binary
+ contains <em>N</em> bytes, <em>N+1</em> match contexts and <em>N</em>
+ sub binaries will be created.</p>
+
+ <p>In R11B, the fastest way to match binaries is:</p>
+
+ <p><em>DO NOT</em> (in R12B)</p>
+ <code type="erl"><![CDATA[
+my_complicated_binary_to_list(Bin) ->
+ my_complicated_binary_to_list(Bin, 0).
+
+my_complicated_binary_to_list(Bin, Skip) ->
+ case Bin of
+ <<_:Skip/binary,Byte,_/binary>> ->
+ [Byte|my_complicated_binary_to_list(Bin, Skip+1)];
+ <<_:Skip/binary>> ->
+ []
+ end.]]></code>
+
+ <p>This function cleverly avoids building sub binaries, but it cannot
+ avoid building a match context in each recursion step. Therefore, in both R11B and R12B,
+ <c>my_complicated_binary_to_list/1</c> builds <em>N+1</em> match
+ contexts. (In a future release, the compiler might be able to generate code
+ that reuses the match context, but don't hold your breath.)</p>
+
+ <p>Returning to <c>my_binary_to_list/1</c>, note that the match context was
+ discarded when the entire binary had been traversed. What happens if
+ the iteration stops before it has reached the end of the binary? Will
+ the optimization still work?</p>
+
+ <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+ T;
+after_zero(<<_,T/binary>>) ->
+ after_zero(T);
+after_zero(<<>>) ->
+ <<>>.
+ ]]></code>
+
+ <p>Yes, it will. The compiler will remove the building of the sub binary in the
+ second clause</p>
+
+ <code type="erl"><![CDATA[
+.
+.
+.
+after_zero(<<_,T/binary>>) ->
+ after_zero(T);
+.
+.
+.]]></code>
+
+ <p>but will generate code that builds a sub binary in the first clause</p>
+
+ <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+ T;
+.
+.
+.]]></code>
+
+ <p>Therefore, <c>after_zero/1</c> will build one match context and one sub binary
+ (assuming it is passed a binary that contains a zero byte).</p>
+
+ <p>Code like the following will also be optimized:</p>
+
+ <code type="erl"><![CDATA[
+all_but_zeroes_to_list(Buffer, Acc, 0) ->
+ {lists:reverse(Acc),Buffer};
+all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
+ all_but_zeroes_to_list(T, Acc, Remaining-1);
+all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
+ all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]></code>
+
+ <p>The compiler will remove building of sub binaries in the second and third clauses,
+ and it will add an instruction to the first clause that will convert <c>Buffer</c>
+ from a match context to a sub binary (or do nothing if <c>Buffer</c> already is a binary).</p>
+
+ <p>Before you begin to think that the compiler can optimize any binary patterns,
+ here is a function that the compiler (currently, at least) is not able to optimize:</p>
+
+ <code type="erl"><![CDATA[
+non_opt_eq([H|T1], <<H,T2/binary>>) ->
+ non_opt_eq(T1, T2);
+non_opt_eq([_|_], <<_,_/binary>>) ->
+ false;
+non_opt_eq([], <<>>) ->
+ true.]]></code>
+
+ <p>It was briefly mentioned earlier that the compiler can only delay creation of
+ sub binaries if it can be sure that the binary will not be shared. In this case,
+ the compiler cannot be sure.</p>
+
+ <p>We will soon show how to rewrite <c>non_opt_eq/2</c> so that the delayed sub binary
+ optimization can be applied, and more importantly, we will show how you can find out
+ whether your code can be optimized.</p>
+
+ <section>
+ <title>The bin_opt_info option</title>
+
+ <p>Use the <c>bin_opt_info</c> option to have the compiler print a lot of
+ information about binary optimizations. It can be given either to the compiler or
+ <c>erlc</c></p>
+
+ <code type="erl"><![CDATA[
+erlc +bin_opt_info Mod.erl]]></code>
+
+ <p>or passed via an environment variable</p>
+
+ <code type="erl"><![CDATA[
+export ERL_COMPILER_OPTIONS=bin_opt_info]]></code>
+
+ <p>Note that the <c>bin_opt_info</c> is not meant to be a permanent option added
+ to your <c>Makefile</c>s, because it is not possible to eliminate all messages that
+ it generates. Therefore, passing the option through the environment is in most cases
+ the most practical approach.</p>
+
+ <p>The warnings will look like this:</p>
+
+ <code type="erl"><![CDATA[
+./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned
+./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed]]></code>
+
+ <p>To make it clearer exactly what code the warnings refer to,
+ in the examples that follow, the warnings are inserted as comments
+ after the clause they refer to:</p>
+
+ <code type="erl"><![CDATA[
+after_zero(<<0,T/binary>>) ->
+ %% NOT OPTIMIZED: sub binary is used or returned
+ T;
+after_zero(<<_,T/binary>>) ->
+ %% OPTIMIZED: creation of sub binary delayed
+ after_zero(T);
+after_zero(<<>>) ->
+ <<>>.]]></code>
+
+ <p>The warning for the first clause tells us that it is not possible to
+ delay the creation of a sub binary, because it will be returned.
+ The warning for the second clause tells us that a sub binary will not be
+ created (yet).</p>
+
+ <p>It is time to revisit the earlier example of the code that could not
+ be optimized and find out why:</p>
+
+ <code type="erl"><![CDATA[
+non_opt_eq([H|T1], <<H,T2/binary>>) ->
+ %% INFO: matching anything else but a plain variable to
+ %% the left of binary pattern will prevent delayed
+ %% sub binary optimization;
+ %% SUGGEST changing argument order
+ %% NOT OPTIMIZED: called function non_opt_eq/2 does not
+ %% begin with a suitable binary matching instruction
+ non_opt_eq(T1, T2);
+non_opt_eq([_|_], <<_,_/binary>>) ->
+ false;
+non_opt_eq([], <<>>) ->
+ true.]]></code>
+
+ <p>The compiler emitted two warnings. The <c>INFO</c> warning refers to the function
+ <c>non_opt_eq/2</c> as a callee, indicating that any functions that call <c>non_opt_eq/2</c>
+ will not be able to make delayed sub binary optimization.
+ There is also a suggestion to change argument order.
+ The second warning (that happens to refer to the same line) refers to the construction of
+ the sub binary itself.</p>
+
+ <p>We will soon show another example that should make the distinction between <c>INFO</c>
+ and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but first we will heed the suggestion
+ to change argument order:</p>
+
+ <code type="erl"><![CDATA[
+opt_eq(<<H,T1/binary>>, [H|T2]) ->
+ %% OPTIMIZED: creation of sub binary delayed
+ opt_eq(T1, T2);
+opt_eq(<<_,_/binary>>, [_|_]) ->
+ false;
+opt_eq(<<>>, []) ->
+ true.]]></code>
+
+ <p>The compiler gives a warning for the following code fragment:</p>
+
+ <code type="erl"><![CDATA[
+match_body([0|_], <<H,_/binary>>) ->
+ %% INFO: matching anything else but a plain variable to
+ %% the left of binary pattern will prevent delayed
+ %% sub binary optimization;
+ %% SUGGEST changing argument order
+ done;
+.
+.
+.]]></code>
+
+ <p>The warning means that <em>if</em> there is a call to <c>match_body/2</c>
+ (from another clause in <c>match_body/2</c> or another function), the
+ delayed sub binary optimization will not be possible. There will be additional
+ warnings for any place where a sub binary is matched out at the end of and
+ passed as the second argument to <c>match_body/2</c>. For instance:</p>
+
+ <code type="erl"><![CDATA[
+match_head(List, <<_:10,Data/binary>>) ->
+ %% NOT OPTIMIZED: called function match_body/2 does not
+ %% begin with a suitable binary matching instruction
+ match_body(List, Data).]]></code>
+
+ </section>
+
+ <section>
+ <title>Unused variables</title>
+
+ <p>The compiler itself figures out if a variable is unused. The same
+ code is generated for each of the following functions</p>
+
+ <code type="erl"><![CDATA[
+count1(<<_,T/binary>>, Count) -> count1(T, Count+1);
+count1(<<>>, Count) -> Count.
+
+count2(<<H,T/binary>>, Count) -> count2(T, Count+1);
+count2(<<>>, Count) -> Count.
+
+count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
+count3(<<>>, Count) -> Count.]]></code>
+
+ <p>In each iteration, the first 8 bits in the binary will be skipped, not matched out.</p>
+
+ </section>
+
+ </section>
+
+</chapter>
+