This module contains functions for manipulating byte-oriented binaries. Although the majority of functions could be implemented using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume less memory (or both) than a counterpart written in pure Erlang.
The module is implemented according to the EEP (Erlang Enhancement Proposal) 31.
The library handles byte-oriented data. Bitstrings that are not
binaries (does not contain whole octets of bits) will result in a
cp()
- Opaque data-type representing a compiled search-pattern. Guaranteed to be a tuple()
to allow programs to distinguish it from non precompiled search patterns.
part() = {Start,Length}
Start = int()
Length = int()
- A representaion of a part (or range) in a binary. Start is a
zero-based offset into a binary() and Length is the length of
that part. As input to functions in this module, a reverse
part specification is allowed, constructed with a negative
Length, so that the part of the binary begins at Start +
Length and is -Length long. This is useful for referencing the
last N bytes of a binary as {size(Binary), -N}. The functions
in this module always return part()'s with positive Length.
Returns the byte at position
The same as
Converts
1> binary:bin_to_list(<<"erlang">>,{1,3}).
"rla"
%% or [114,108,97] in list notation.
If
The same as
Builds an internal structure representing a compilation of a
search-pattern, later to be used in the
When a list of binaries is given, it denotes a set of
alternative binaries to search for. I.e if
The list of binaries used for search alternatives shall be flat and proper.
If
The same as
Creates a binary with the content of
This function will always create a new binary, even if
By deliberately copying a single binary to avoid referencing a larger binary, one might, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed. Sharing binary data is usually good. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate copying might be a good idea.
If
The same as
Converts the binary digit representation, in big or little
endian, of a positive integer in
Example:
1> binary:decode_unsigned(<<169,138,199>>,big).
11111111
The same as
Converts a positive integer to the smallest possible representation in a binary digit representation, either big or little endian.
Example:
1> binary:encode_unsigned(11111111,big).
<<169,138,199>>
Returns the first byte of the binary
Returns the last byte of the binary
Works exactly as
Returns the length of the longest common prefix of the
binaries in the list
1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]).
2
2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]).
0
If
Returns the length of the longest common suffix of the
binaries in the list
1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]).
3
2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]).
0
If
The same as
Searches for the first occurrence of
The function will return
1> binary:match(<<"abcde">>, [<<"bcde">>,<<"cd">>],[]).
{1,4}
Even though
Summary of the options:
Only the given part is searched. Return values still have
offsets from the beginning of
If none of the strings in
For a description of
If
The same as
Works like match, but the
The first and longest match is preferred to a shorter, which is illustrated by the following example:
1> binary:matches(<<"abcde">>,
[<<"bcde">>,<<"bc">>>,<<"de">>],[]).
[{1,4}]
The result shows that <<bcde">> is selected instead of the shorter match <<"bc">> (which would have given raise to one more match,<<"de">>). This corresponds to the behavior of posix regular expressions (and programs like awk), but is not consistent with alternative matches in re (and Perl), where instead lexical ordering in the search pattern selects which string matches.
If none of the strings in pattern is found, an empty list is returned.
For a description of
If
Extracts the part of the binary
Negative length can be used to extract bytes at the end of a binary:
1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
2> binary:part(Bin,{byte_size(Bin), -5)}).
<<6,7,8,9,10>>
If
The same as
If a binary references a larger binary (often described as
being a sub-binary), it can be useful to get the size of the
actual referenced binary. This function can be used in a program
to trigger the use of
Example:
store(Binary, GBSet) ->
NewBin =
case binary:referenced_byte_size(Binary) of
Large when Large > 2 * byte_size(Binary) ->
binary:copy(Binary);
_ ->
Binary
end,
gb_sets:insert(NewBin,GBSet).
In this example, we chose to copy the binary content before
inserting it in the
Binary sharing will occur whenever binaries are taken apart,
this is the fundamental reason why binaries are fast,
decomposition can always be done with O(1) complexity. In rare
circumstances this data sharing is however undesirable, why this
function together with
Example of binary sharing:
1> A = binary:copy(<<1>>,100).
<<1,1,1,1,1 ...
2> byte_size(A).
100
3> binary:referenced_byte_size(A)
100
4> <<_:10/binary,B:10/binary,_/binary>> = A.
<<1,1,1,1,1 ...
5> byte_size(B).
10
6> binary:referenced_byte_size(B)
100
Binary data is shared among processes. If another process still references the larger binary, copying the part this process uses only consumes more memory and will not free up the larger binary for garbage collection. Use this kind of intrusive functions with extreme care, and only if a real problem is detected.
The same as
Constructs a new binary by replacing the parts in
If the matching sub-part of
1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]).
<<"a[b]cde">>
2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
[global,{insert_replaced,1}]).
<<"a[b]c[d]e">>
3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
[global,{insert_replaced,[1,1]}]).
<<"a[bb]c[dd]e">>
4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,
[global,{insert_replaced,[1,2]}]).
<<"a[b-b]c[d-d]e">>
If any position given in
The options
For a description of
The same as
Splits Binary into a list of binaries based on Pattern. If the option global is not given, only the first occurrence of Pattern in Subject will give rise to a split.
The parts of Pattern actually found in Subject are not included in the result.
Example:
1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
[<<1,255,4>>, <<2,3>>]
2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
[<<0,1>>,<<4>>,<<9>>]
Summary of options:
Works as in
Removes trailing empty parts of the result (as does trim in
Repeats the split until the
Example of the difference between a scope and taking the binary apart before splitting:
1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]).
[<<"ban">>,<<"na">>]
2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]).
[<<"n">>,<<"n">>]
The return type is always a list of binaries that are all
referencing
For a description of