In R12B, the most natural way to write binary construction and matching is now significantly faster than in earlier releases.
To construct at binary, you can simply write
DO (in R12B) / REALLY DO NOT (in earlier releases)
my_list_to_binary(List, <<>>).
my_list_to_binary([H|T], Acc) ->
my_list_to_binary(T, <>);
my_list_to_binary([], Acc) ->
Acc.]]>
In releases before R12B,
The extra space allocated (or reallocated) will be twice the size of the existing binary data, or 256, whichever is larger.
The most natural way to match binaries is now the fastest:
DO (in R12B)
>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]>
Internally, binaries and bitstrings are implemented in the same way. In this section, we will call them binaries since that is what they are called in the emulator source code.
There are four types of binary objects internally. Two of them are containers for binary data and two of them are merely references to a part of a binary.
The binary containers are called refc binaries (short for reference-counted binaries) and heap binaries.
The binary object can be referenced by any number of ProcBins from any number of processes; the object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.
All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.
There are two types of reference objects that can reference part of a refc binary or heap binary. They are called sub binaries and match contexts.
In R11B, a match context was only used during a binary matching operation.
In R12B, the compiler tries to avoid generating code that creates a sub binary, only to shortly afterwards create a new match context and discard the sub binary. Instead of creating a sub binary, the match context is kept.
The compiler can only do this optimization if it can know for sure that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.
In R12B, appending to a binary or bitstring
>
<>]]>
is specially optimized by the run-time system. Because the run-time system handles the optimization (instead of the compiler), there are very few circumstances in which the optimization will not work.
To explain how it works, we will go through this code
>, %% 1
Bin1 = <>, %% 2
Bin2 = <>, %% 3
Bin3 = <>, %% 4
Bin4 = <>, %% 5 !!!
{Bin4,Bin3} %% 6]]>
line by line.
The first line (marked with the
The second line is an append operation. Since
It gets more interesting in the third line.
Same thing in the fourth line. There are 252 bytes left, so there is no problem storing another three bytes.
But in the fifth line something interesting happens.
Note that we don't append to the previous result in
What will happen?
The run-time system will see that
The optimization of the binary append operation requires that there is a single ProcBin and a single reference to the ProcBin for the binary. The reason is that the binary object can be moved (reallocated) during an append operation, and when that happens the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.
Therefore, certain operations on a binary will mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.
When appending to a binary
>]]>
only the binary returned from the latest append operation will
support further cheap append operations. In the code fragment above,
appending to
If a binary is sent as a message to a process or port, the binary will be shrunk and any further append operation will copy the binary data into a new binary. For instance, in the following code fragment
>,
PortOrPid ! Bin1,
Bin = <> %% Bin1 will be COPIED
]]>
The same thing happens if you insert a binary into an ets
table or send it to a port using
Matching a binary will also cause it to shrink and the next append operation will copy the binary data:
>,
<> = Bin1,
Bin = <> %% Bin1 will be COPIED
]]>
The reason is that a
If a process simply keeps binaries (either in "loop data" or in the process dictionary), the garbage collector may eventually shrink the binaries. If only one such binary is kept, it will not be shrunk. If the process later appends to a binary that has been shrunk, the binary object will be reallocated to make place for the data to be appended.
We will revisit the example shown earlier
DO (in R12B)
>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]>
too see what is happening under the hood.
The very first time
In R11B, at this point a
Therefore, in R12B,
When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next garbage collection, since there is no longer any reference to it).
To summarize,
In R11B, the fastest way to match binaries is:
DO NOT (in R12B)
my_complicated_binary_to_list(Bin, 0).
my_complicated_binary_to_list(Bin, Skip) ->
case Bin of
<<_:Skip/binary,Byte,_/binary>> ->
[Byte|my_complicated_binary_to_list(Bin, Skip+1)];
<<_:Skip/binary>> ->
[]
end.]]>
This function cleverly avoids building sub binaries, but it cannot
avoid building a match context in each recursion step. Therefore, in both R11B and R12B,
Returning to
>) ->
T;
after_zero(<<_,T/binary>>) ->
after_zero(T);
after_zero(<<>>) ->
<<>>.
]]>
Yes, it will. The compiler will remove the building of the sub binary in the second clause
>) ->
after_zero(T);
.
.
.]]>
but will generate code that builds a sub binary in the first clause
>) ->
T;
.
.
.]]>
Therefore,
Code like the following will also be optimized:
{lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<>, Acc, Remaining) ->
all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]>
The compiler will remove building of sub binaries in the second and third clauses,
and it will add an instruction to the first clause that will convert
Before you begin to think that the compiler can optimize any binary patterns, here is a function that the compiler (currently, at least) is not able to optimize:
>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]>
It was briefly mentioned earlier that the compiler can only delay creation of sub binaries if it can be sure that the binary will not be shared. In this case, the compiler cannot be sure.
We will soon show how to rewrite
Use the
or passed via an environment variable
Note that the
The warnings will look like this:
To make it clearer exactly what code the warnings refer to, in the examples that follow, the warnings are inserted as comments after the clause they refer to:
>) ->
%% NOT OPTIMIZED: sub binary is used or returned
T;
after_zero(<<_,T/binary>>) ->
%% OPTIMIZED: creation of sub binary delayed
after_zero(T);
after_zero(<<>>) ->
<<>>.]]>
The warning for the first clause tells us that it is not possible to delay the creation of a sub binary, because it will be returned. The warning for the second clause tells us that a sub binary will not be created (yet).
It is time to revisit the earlier example of the code that could not be optimized and find out why:
>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
%% NOT OPTIMIZED: called function non_opt_eq/2 does not
%% begin with a suitable binary matching instruction
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]>
The compiler emitted two warnings. The
We will soon show another example that should make the distinction between
>, [H|T2]) ->
%% OPTIMIZED: creation of sub binary delayed
opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
false;
opt_eq(<<>>, []) ->
true.]]>
The compiler gives a warning for the following code fragment:
>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
done;
.
.
.]]>
The warning means that if there is a call to
>) ->
%% NOT OPTIMIZED: called function match_body/2 does not
%% begin with a suitable binary matching instruction
match_body(List, Data).]]>
The compiler itself figures out if a variable is unused. The same code is generated for each of the following functions
>, Count) -> count1(T, Count+1);
count1(<<>>, Count) -> Count.
count2(<>, Count) -> count2(T, Count+1);
count2(<<>>, Count) -> Count.
count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
count3(<<>>, Count) -> Count.]]>
In each iteration, the first 8 bits in the binary will be skipped, not matched out.