diff options
Diffstat (limited to 'system/doc/efficiency_guide/binaryhandling.xml')
-rw-r--r-- | system/doc/efficiency_guide/binaryhandling.xml | 528 |
1 files changed, 528 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/binaryhandling.xml b/system/doc/efficiency_guide/binaryhandling.xml new file mode 100644 index 0000000000..8746de4b60 --- /dev/null +++ b/system/doc/efficiency_guide/binaryhandling.xml @@ -0,0 +1,528 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2007</year> + <year>2007</year> + <holder>Ericsson AB, All Rights Reserved</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + The Initial Developer of the Original Code is Ericsson AB. + </legalnotice> + + <title>Constructing and matching binaries</title> + <prepared>Bjorn Gustavsson</prepared> + <docno></docno> + <date>2007-10-12</date> + <rev></rev> + <file>binaryhandling.xml</file> + </header> + + <p>In R12B, the most natural way to write binary construction and matching is now + significantly faster than in earlier releases.</p> + + <p>To construct at binary, you can simply write</p> + + <p><em>DO</em> (in R12B) / <em>REALLY DO NOT</em> (in earlier releases)</p> + <code type="erl"><![CDATA[ +my_list_to_binary(List) -> + my_list_to_binary(List, <<>>). + +my_list_to_binary([H|T], Acc) -> + my_list_to_binary(T, <<Acc/binary,H>>); +my_list_to_binary([], Acc) -> + Acc.]]></code> + + <p>In releases before R12B, <c>Acc</c> would be copied in every iteration. + In R12B, <c>Acc</c> will be copied only in the first iteration and extra + space will be allocated at the end of the copied binary. In the next iteration, + <c>H</c> will be written in to the extra space. When the extra space runs out, + the binary will be reallocated with more extra space.</p> + + <p>The extra space allocated (or reallocated) will be twice the size of the + existing binary data, or 256, whichever is larger.</p> + + <p>The most natural way to match binaries is now the fastest:</p> + + <p><em>DO</em> (in R12B)</p> + <code type="erl"><![CDATA[ +my_binary_to_list(<<H,T/binary>>) -> + [H|my_binary_to_list(T)]; +my_binary_to_list(<<>>) -> [].]]></code> + + <section> + <title>How binaries are implemented</title> + + <p>Internally, binaries and bitstrings are implemented in the same way. + In this section, we will call them <em>binaries</em> since that is what + they are called in the emulator source code.</p> + + <p>There are four types of binary objects internally. Two of them are + containers for binary data and two of them are merely references to + a part of a binary.</p> + + <p>The binary containers are called <em>refc binaries</em> + (short for <em>reference-counted binaries</em>) and <em>heap binaries</em>.</p> + + <p><marker id="refc_binary"></marker><em>Refc binaries</em> + consist of two parts: an object stored on + the process heap, called a <em>ProcBin</em>, and the binary object itself + stored outside all process heaps.</p> + + <p>The binary object can be referenced by any number of ProcBins from any + number of processes; the object contains a reference counter to keep track + of the number of references, so that it can be removed when the last + reference disappears.</p> + + <p>All ProcBin objects in a process are part of a linked list, so that + the garbage collector can keep track of them and decrement the reference + counters in the binary when a ProcBin disappears.</p> + + <p><marker id="heap_binary"></marker><em>Heap binaries</em> are small binaries, + up to 64 bytes, that are stored directly on the process heap. + They will be copied when the process + is garbage collected and when they are sent as a message. They don't + require any special handling by the garbage collector.</p> + + <p>There are two types of reference objects that can reference part of + a refc binary or heap binary. They are called <em>sub binaries</em> and + <em>match contexts</em>.</p> + + <p><marker id="sub_binary"></marker>A <em>sub binary</em> + is created by <c>split_binary/2</c> and when + a binary is matched out in a binary pattern. A sub binary is a reference + into a part of another binary (refc or heap binary, never into a another + sub binary). Therefore, matching out a binary is relatively cheap because + the actual binary data is never copied.</p> + + <p><marker id="match_context"></marker>A <em>match context</em> is + similar to a sub binary, but is optimized + for binary matching; for instance, it contains a direct pointer to the binary + data. For each field that is matched out of a binary, the position in the + match context will be incremented.</p> + + <p>In R11B, a match context was only using during a binary matching + operation.</p> + + <p>In R12B, the compiler tries to avoid generating code that + creates a sub binary, only to shortly afterwards create a new match + context and discard the sub binary. Instead of creating a sub binary, + the match context is kept.</p> + + <p>The compiler can only do this optimization if it can know for sure + that the match context will not be shared. If it would be shared, the + functional properties (also called referential transparency) of Erlang + would break.</p> + </section> + + <section> + <title>Constructing binaries</title> + + <p>In R12B, appending to a binary or bitstring</p> + + <code type="erl"><![CDATA[ +<<Binary/binary, ...>> +<<Binary/bitstring, ...>>]]></code> + + <p>is specially optimized by the <em>run-time system</em>. + Because the run-time system handles the optimization (instead of + the compiler), there are very few circumstances in which the optimization + will not work.</p> + + <p>To explain how it works, we will go through this code</p> + + <code type="erl"><![CDATA[ +Bin0 = <<0>>, %% 1 +Bin1 = <<Bin0/binary,1,2,3>>, %% 2 +Bin2 = <<Bin1/binary,4,5,6>>, %% 3 +Bin3 = <<Bin2/binary,7,8,9>>, %% 4 +Bin4 = <<Bin1/binary,17>>, %% 5 !!! +{Bin4,Bin3} %% 6]]></code> + + <p>line by line.</p> + + <p>The first line (marked with the <c>%% 1</c> comment), assigns + a <seealso marker="#heap_binary">heap binary</seealso> to + the variable <c>Bin0</c>.</p> + + <p>The second line is an append operation. Since <c>Bin0</c> + has not been involved in an append operation, + a new <seealso marker="#refc_binary">refc binary</seealso> + will be created and the contents of <c>Bin0</c> will be copied + into it. The <em>ProcBin</em> part of the refc binary will have + its size set to the size of the data stored in the binary, while + the binary object will have extra space allocated. + The size of the binary object will be either twice the + size of <c>Bin0</c> or 256, whichever is larger. In this case + it will be 256.</p> + + <p>It gets more interesting in the third line. + <c>Bin1</c> <em>has</em> been used in an append operation, + and it has 255 bytes of unused storage at the end, so the three new bytes + will be stored there.</p> + + <p>Same thing in the fourth line. There are 252 bytes left, + so there is no problem storing another three bytes.</p> + + <p>But in the fifth line something <em>interesting</em> happens. + Note that we don't append to the previous result in <c>Bin3</c>, + but to <c>Bin1</c>. We expect that <c>Bin4</c> will be assigned + the value <c><<0,1,2,3,17>></c>. We also expect that + <c>Bin3</c> will retain its value + (<c><<0,1,2,3,4,5,6,7,8,9>></c>). + Clearly, the run-time system cannot write the byte <c>17</c> into the binary, + because that would change the value of <c>Bin3</c> to + <c><<0,1,2,3,4,17,6,7,8,9>></c>.</p> + + <p>What will happen?</p> + + <p>The run-time system will see that <c>Bin1</c> is the result + from a previous append operation (not from the latest append operation), + so it will <em>copy</em> the contents of <c>Bin1</c> to a new binary + and reserve extra storage and so on. (We will not explain here how the + run-time system can know that it is not allowed to write into <c>Bin1</c>; + it is left as an exercise to the curious reader to figure out how it is + done by reading the emulator sources, primarily <c>erl_bits.c</c>.)</p> + + <section> + <title>Circumstances that force copying</title> + + <p>The optimization of the binary append operation requires that + there is a <em>single</em> ProcBin and a <em>single reference</em> to the + ProcBin for the binary. The reason is that the binary object can be + moved (reallocated) during an append operation, and when that happens + the pointer in the ProcBin must be updated. If there would be more than + on ProcBin pointing to the binary object, it would not be possible to + find and update all of them.</p> + + <p>Therefore, certain operations on a binary will mark it so that + any future append operation will be forced to copy the binary. + In most cases, the binary object will be shrunk at the same time + to reclaim the extra space allocated for growing.</p> + + <p>When appending to a binary</p> + + <code type="erl"><![CDATA[ +Bin = <<Bin0,...>>]]></code> + + <p>only the binary returned from the latest append operation will + support further cheap append operations. In the code fragment above, + appending to <c>Bin</c> will be cheap, while appending to <c>Bin0</c> + will force the creation of a new binary and copying of the contents + of <c>Bin0</c>.</p> + + <p>If a binary is sent as a message to a process or port, the binary + will be shrunk and any further append operation will copy the binary + data into a new binary. For instance, in the following code fragment</p> + + <code type="erl"><![CDATA[ +Bin1 = <<Bin0,...>>, +PortOrPid ! Bin1, +Bin = <<Bin1,...>> %% Bin1 will be COPIED +]]></code> + + <p><c>Bin1</c> will be copied in the third line.</p> + + <p>The same thing happens if you insert a binary into an <em>ets</em> + table or send it to a port using <c>erlang:port_command/2</c>.</p> + + <p>Matching a binary will also cause it to shrink and the next append + operation will copy the binary data:</p> + + <code type="erl"><![CDATA[ +Bin1 = <<Bin0,...>>, +<<X,Y,Z,T/binary>> = Bin1, +Bin = <<Bin1,...>> %% Bin1 will be COPIED +]]></code> + + <p>The reason is that a <seealso marker="#match_context">match context</seealso> + contains a direct pointer to the binary data.</p> + + <p>If a process simply keeps binaries (either in "loop data" or in the process + dictionary), the garbage collector may eventually shrink the binaries. + If only one such binary is kept, it will not be shrunk. If the process later + appends to a binary that has been shrunk, the binary object will be reallocated + to make place for the data to be appended.</p> + </section> + + </section> + + <section> + <title>Matching binaries</title> + + <p>We will revisit the example shown earlier</p> + + <p><em>DO</em> (in R12B)</p> + <code type="erl"><![CDATA[ +my_binary_to_list(<<H,T/binary>>) -> + [H|my_binary_to_list(T)]; +my_binary_to_list(<<>>) -> [].]]></code> + + <p>too see what is happening under the hood.</p> + + <p>The very first time <c>my_binary_to_list/1</c> is called, + a <seealso marker="#match_context">match context</seealso> + will be created. The match context will point to the first + byte of the binary. One byte will be matched out and the match context + will be updated to point to the second byte in the binary.</p> + + <p>In R11B, at this point a <seealso marker="#sub_binary">sub binary</seealso> + would be created. In R12B, + the compiler sees that there is no point in creating a sub binary, + because there will soon be a call to a function (in this case, + to <c>my_binary_to_list/1</c> itself) that will immediately + create a new match context and discard the sub binary.</p> + + <p>Therefore, in R12B, <c>my_binary_to_list/1</c> will call itself + with the match context instead of with a sub binary. The instruction + that initializes the matching operation will basically do nothing + when it sees that it was passed a match context instead of a binary.</p> + + <p>When the end of the binary is reached and second clause matches, + the match context will simply be discarded (removed in the next + garbage collection, since there is no longer any reference to it).</p> + + <p>To summarize, <c>my_binary_to_list/1</c> in R12B only needs to create + <em>one</em> match context and no sub binaries. In R11B, if the binary + contains <em>N</em> bytes, <em>N+1</em> match contexts and <em>N</em> + sub binaries will be created.</p> + + <p>In R11B, the fastest way to match binaries is:</p> + + <p><em>DO NOT</em> (in R12B)</p> + <code type="erl"><![CDATA[ +my_complicated_binary_to_list(Bin) -> + my_complicated_binary_to_list(Bin, 0). + +my_complicated_binary_to_list(Bin, Skip) -> + case Bin of + <<_:Skip/binary,Byte,_/binary>> -> + [Byte|my_complicated_binary_to_list(Bin, Skip+1)]; + <<_:Skip/binary>> -> + [] + end.]]></code> + + <p>This function cleverly avoids building sub binaries, but it cannot + avoid building a match context in each recursion step. Therefore, in both R11B and R12B, + <c>my_complicated_binary_to_list/1</c> builds <em>N+1</em> match + contexts. (In a future release, the compiler might be able to generate code + that reuses the match context, but don't hold your breath.)</p> + + <p>Returning to <c>my_binary_to_list/1</c>, note that the match context was + discarded when the entire binary had been traversed. What happens if + the iteration stops before it has reached the end of the binary? Will + the optimization still work?</p> + + <code type="erl"><![CDATA[ +after_zero(<<0,T/binary>>) -> + T; +after_zero(<<_,T/binary>>) -> + after_zero(T); +after_zero(<<>>) -> + <<>>. + ]]></code> + + <p>Yes, it will. The compiler will remove the building of the sub binary in the + second clause</p> + + <code type="erl"><![CDATA[ +. +. +. +after_zero(<<_,T/binary>>) -> + after_zero(T); +. +. +.]]></code> + + <p>but will generate code that builds a sub binary in the first clause</p> + + <code type="erl"><![CDATA[ +after_zero(<<0,T/binary>>) -> + T; +. +. +.]]></code> + + <p>Therefore, <c>after_zero/1</c> will build one match context and one sub binary + (assuming it is passed a binary that contains a zero byte).</p> + + <p>Code like the following will also be optimized:</p> + + <code type="erl"><![CDATA[ +all_but_zeroes_to_list(Buffer, Acc, 0) -> + {lists:reverse(Acc),Buffer}; +all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) -> + all_but_zeroes_to_list(T, Acc, Remaining-1); +all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) -> + all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]></code> + + <p>The compiler will remove building of sub binaries in the second and third clauses, + and it will add an instruction to the first clause that will convert <c>Buffer</c> + from a match context to a sub binary (or do nothing if <c>Buffer</c> already is a binary).</p> + + <p>Before you begin to think that the compiler can optimize any binary patterns, + here is a function that the compiler (currently, at least) is not able to optimize:</p> + + <code type="erl"><![CDATA[ +non_opt_eq([H|T1], <<H,T2/binary>>) -> + non_opt_eq(T1, T2); +non_opt_eq([_|_], <<_,_/binary>>) -> + false; +non_opt_eq([], <<>>) -> + true.]]></code> + + <p>It was briefly mentioned earlier that the compiler can only delay creation of + sub binaries if it can be sure that the binary will not be shared. In this case, + the compiler cannot be sure.</p> + + <p>We will soon show how to rewrite <c>non_opt_eq/2</c> so that the delayed sub binary + optimization can be applied, and more importantly, we will show how you can find out + whether your code can be optimized.</p> + + <section> + <title>The bin_opt_info option</title> + + <p>Use the <c>bin_opt_info</c> option to have the compiler print a lot of + information about binary optimizations. It can be given either to the compiler or + <c>erlc</c></p> + + <code type="erl"><![CDATA[ +erlc +bin_opt_info Mod.erl]]></code> + + <p>or passed via an environment variable</p> + + <code type="erl"><![CDATA[ +export ERL_COMPILER_OPTIONS=bin_opt_info]]></code> + + <p>Note that the <c>bin_opt_info</c> is not meant to be a permanent option added + to your <c>Makefile</c>s, because it is not possible to eliminate all messages that + it generates. Therefore, passing the option through the environment is in most cases + the most practical approach.</p> + + <p>The warnings will look like this:</p> + + <code type="erl"><![CDATA[ +./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned +./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed]]></code> + + <p>To make it clearer exactly what code the warnings refer to, + in the examples that follow, the warnings are inserted as comments + after the clause they refer to:</p> + + <code type="erl"><![CDATA[ +after_zero(<<0,T/binary>>) -> + %% NOT OPTIMIZED: sub binary is used or returned + T; +after_zero(<<_,T/binary>>) -> + %% OPTIMIZED: creation of sub binary delayed + after_zero(T); +after_zero(<<>>) -> + <<>>.]]></code> + + <p>The warning for the first clause tells us that it is not possible to + delay the creation of a sub binary, because it will be returned. + The warning for the second clause tells us that a sub binary will not be + created (yet).</p> + + <p>It is time to revisit the earlier example of the code that could not + be optimized and find out why:</p> + + <code type="erl"><![CDATA[ +non_opt_eq([H|T1], <<H,T2/binary>>) -> + %% INFO: matching anything else but a plain variable to + %% the left of binary pattern will prevent delayed + %% sub binary optimization; + %% SUGGEST changing argument order + %% NOT OPTIMIZED: called function non_opt_eq/2 does not + %% begin with a suitable binary matching instruction + non_opt_eq(T1, T2); +non_opt_eq([_|_], <<_,_/binary>>) -> + false; +non_opt_eq([], <<>>) -> + true.]]></code> + + <p>The compiler emitted two warnings. The <c>INFO</c> warning refers to the function + <c>non_opt_eq/2</c> as a callee, indicating that any functions that call <c>non_opt_eq/2</c> + will not be able to make delayed sub binary optimization. + There is also a suggestion to change argument order. + The second warning (that happens to refer to the same line) refers to the construction of + the sub binary itself.</p> + + <p>We will soon show another example that should make the distinction between <c>INFO</c> + and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but first we will heed the suggestion + to change argument order:</p> + + <code type="erl"><![CDATA[ +opt_eq(<<H,T1/binary>>, [H|T2]) -> + %% OPTIMIZED: creation of sub binary delayed + opt_eq(T1, T2); +opt_eq(<<_,_/binary>>, [_|_]) -> + false; +opt_eq(<<>>, []) -> + true.]]></code> + + <p>The compiler gives a warning for the following code fragment:</p> + + <code type="erl"><![CDATA[ +match_body([0|_], <<H,_/binary>>) -> + %% INFO: matching anything else but a plain variable to + %% the left of binary pattern will prevent delayed + %% sub binary optimization; + %% SUGGEST changing argument order + done; +. +. +.]]></code> + + <p>The warning means that <em>if</em> there is a call to <c>match_body/2</c> + (from another clause in <c>match_body/2</c> or another function), the + delayed sub binary optimization will not be possible. There will be additional + warnings for any place where a sub binary is matched out at the end of and + passed as the second argument to <c>match_body/2</c>. For instance:</p> + + <code type="erl"><![CDATA[ +match_head(List, <<_:10,Data/binary>>) -> + %% NOT OPTIMIZED: called function match_body/2 does not + %% begin with a suitable binary matching instruction + match_body(List, Data).]]></code> + + </section> + + <section> + <title>Unused variables</title> + + <p>The compiler itself figures out if a variable is unused. The same + code is generated for each of the following functions</p> + + <code type="erl"><![CDATA[ +count1(<<_,T/binary>>, Count) -> count1(T, Count+1); +count1(<<>>, Count) -> Count. + +count2(<<H,T/binary>>, Count) -> count2(T, Count+1); +count2(<<>>, Count) -> Count. + +count3(<<_H,T/binary>>, Count) -> count3(T, Count+1); +count3(<<>>, Count) -> Count.]]></code> + + <p>In each iteration, the first 8 bits in the binary will be skipped, not matched out.</p> + + </section> + + </section> + +</chapter> + |