Binaries can be efficiently built in the following way:
DO
my_list_to_binary(List, <<>>).
my_list_to_binary([H|T], Acc) ->
my_list_to_binary(T, <>);
my_list_to_binary([], Acc) ->
Acc.]]>
Binaries can be efficiently matched like this:
DO
>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]>
Internally, binaries and bitstrings are implemented in the same way. In this section, they are called binaries because that is what they are called in the emulator source code.
Four types of binary objects are available internally:
Two are containers for binary data and are called:
Two are merely references to a part of a binary and are called:
Refc binaries consist of two parts:
The binary object can be referenced by any number of ProcBins from any number of processes. The object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.
All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.
Heap binaries are small binaries, up to 64 bytes, and are stored directly on the process heap. They are copied when the process is garbage-collected and when they are sent as a message. They do not require any special handling by the garbage collector.
The reference objects sub binaries and match contexts can reference part of a refc binary or heap binary.
A match context is similar to a sub binary, but is optimized for binary matching. For example, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context is incremented.
The compiler tries to avoid generating code that creates a sub binary, only to shortly afterwards create a new match context and discard the sub binary. Instead of creating a sub binary, the match context is kept.
The compiler can only do this optimization if it knows that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.
Appending to a binary or bitstring is specially optimized by the runtime system:
>
<>]]>
As the runtime system handles the optimization (instead of the compiler), there are very few circumstances in which the optimization does not work.
To explain how it works, let us examine the following code line by line:
>, %% 1
Bin1 = <>, %% 2
Bin2 = <>, %% 3
Bin3 = <>, %% 4
Bin4 = <>, %% 5 !!!
{Bin4,Bin3} %% 6]]>
The runtime system sees that
The optimization of the binary append operation requires that there is a single ProcBin and a single reference to the ProcBin for the binary. The reason is that the binary object can be moved (reallocated) during an append operation, and when that happens, the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.
Therefore, certain operations on a binary mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.
When appending to a binary as follows, only the binary returned from the latest append operation will support further cheap append operations:
>]]>
In the code fragment in the beginning of this section,
appending to
If a binary is sent as a message to a process or port, the binary
will be shrunk and any further append operation will copy the binary
data into a new binary. For example, in the following code fragment
>,
PortOrPid ! Bin1,
Bin = <> %% Bin1 will be COPIED
]]>
The same happens if you insert a binary into an Ets
table, send it to a port using
Matching a binary will also cause it to shrink and the next append operation will copy the binary data:
>,
<> = Bin1,
Bin = <> %% Bin1 will be COPIED
]]>
The reason is that a
If a process simply keeps binaries (either in "loop data" or in the process dictionary), the garbage collector can eventually shrink the binaries. If only one such binary is kept, it will not be shrunk. If the process later appends to a binary that has been shrunk, the binary object will be reallocated to make place for the data to be appended.
Let us revisit the example in the beginning of the previous section:
DO
>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]>
The first time
At this point it would make sense to create a
Therefore
When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next garbage collection, as there is no longer any reference to it).
To summarize,
Notice that the match context in
>) ->
T;
after_zero(<<_,T/binary>>) ->
after_zero(T);
after_zero(<<>>) ->
<<>>.
]]>
Yes, it will. The compiler will remove the building of the sub binary in the second clause:
>) ->
after_zero(T);
...]]>
But it will generate code that builds a sub binary in the first clause:
>) ->
T;
...]]>
Therefore,
Code like the following will also be optimized:
{lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<>, Acc, Remaining) ->
all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]>
The compiler removes building of sub binaries in the second and third
clauses, and it adds an instruction to the first clause that converts
Before you begin to think that the compiler can optimize any binary patterns, the following function cannot be optimized by the compiler (currently, at least):
>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]>
It was mentioned earlier that the compiler can only delay creation of sub binaries if it knows that the binary will not be shared. In this case, the compiler cannot know.
Soon it is shown how to rewrite
Use the
or passed through an environment variable:
Notice that the
The warnings look as follows:
To make it clearer exactly what code the warnings refer to, the warnings in the following examples are inserted as comments after the clause they refer to, for example:
>) ->
%% NOT OPTIMIZED: sub binary is used or returned
T;
after_zero(<<_,T/binary>>) ->
%% OPTIMIZED: creation of sub binary delayed
after_zero(T);
after_zero(<<>>) ->
<<>>.]]>
The warning for the first clause says that the creation of a sub binary cannot be delayed, because it will be returned. The warning for the second clause says that a sub binary will not be created (yet).
Let us revisit the earlier example of the code that could not be optimized and find out why:
>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
%% NOT OPTIMIZED: called function non_opt_eq/2 does not
%% begin with a suitable binary matching instruction
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]>
The compiler emitted two warnings. The
Soon another example will show the difference between the
>, [H|T2]) ->
%% OPTIMIZED: creation of sub binary delayed
opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
false;
opt_eq(<<>>, []) ->
true.]]>
The compiler gives a warning for the following code fragment:
>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
done;
...]]>
The warning means that if there is a call to
>) ->
%% NOT OPTIMIZED: called function match_body/2 does not
%% begin with a suitable binary matching instruction
match_body(List, Data).]]>
The compiler figures out if a variable is unused. The same code is generated for each of the following functions:
>, Count) -> count1(T, Count+1);
count1(<<>>, Count) -> Count.
count2(<>, Count) -> count2(T, Count+1);
count2(<<>>, Count) -> Count.
count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
count3(<<>>, Count) -> Count.]]>
In each iteration, the first 8 bits in the binary will be skipped, not matched out.
Binary handling was significantly improved in R12B. Because code that was efficient in R11B might not be efficient in R12B, and vice versa, earlier revisions of this Efficiency Guide contained some information about binary handling in R11B.