1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
|
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">
<chapter>
<header>
<copyright>
<year>2007</year>
<year>2014</year>
<holder>Ericsson AB, All Rights Reserved</holder>
</copyright>
<legalnotice>
The contents of this file are subject to the Erlang Public License,
Version 1.1, (the "License"); you may not use this file except in
compliance with the License. You should have received a copy of the
Erlang Public License along with this software. If not, it can be
retrieved online at http://www.erlang.org/.
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
the License for the specific language governing rights and limitations
under the License.
The Initial Developer of the Original Code is Ericsson AB.
</legalnotice>
<title>Constructing and Matching Binaries</title>
<prepared>Bjorn Gustavsson</prepared>
<docno></docno>
<date>2007-10-12</date>
<rev></rev>
<file>binaryhandling.xml</file>
</header>
<p>In R12B, the most natural way to construct and match binaries is
significantly faster than in earlier releases.</p>
<p>To construct a binary, you can simply write as follows:</p>
<p><em>DO</em> (in R12B) / <em>REALLY DO NOT</em> (in earlier releases)</p>
<code type="erl"><![CDATA[
my_list_to_binary(List) ->
my_list_to_binary(List, <<>>).
my_list_to_binary([H|T], Acc) ->
my_list_to_binary(T, <<Acc/binary,H>>);
my_list_to_binary([], Acc) ->
Acc.]]></code>
<p>In releases before R12B, <c>Acc</c> is copied in every iteration.
In R12B, <c>Acc</c> is copied only in the first iteration and extra
space is allocated at the end of the copied binary. In the next iteration,
<c>H</c> is written into the extra space. When the extra space runs out,
the binary is reallocated with more extra space. The extra space allocated
(or reallocated) is twice the size of the
existing binary data, or 256, whichever is larger.</p>
<p>The most natural way to match binaries is now the fastest:</p>
<p><em>DO</em> (in R12B)</p>
<code type="erl"><![CDATA[
my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]></code>
<section>
<title>How Binaries are Implemented</title>
<p>Internally, binaries and bitstrings are implemented in the same way.
In this section, they are called <em>binaries</em> because that is what
they are called in the emulator source code.</p>
<p>Four types of binary objects are available internally:</p>
<list type="bulleted">
<item><p>Two are containers for binary data and are called:</p>
<list type="bulleted">
<item><em>Refc binaries</em> (short for
<em>reference-counted binaries</em>)</item>
<item><em>Heap binaries</em></item>
</list></item>
<item><p>Two are merely references to a part of a binary and
are called:</p>
<list type="bulleted">
<item><em>sub binaries</em></item>
<item><em>match contexts</em></item>
</list></item>
</list>
<section>
<marker id="refc_binary"></marker>
<title>Refc Binaries</title>
<p>Refc binaries consist of two parts:</p>
<list type="bulleted">
<item>An object stored on the process heap, called a
<em>ProcBin</em></item>
<item>The binary object itself, stored outside all process
heaps</item>
</list>
<p>The binary object can be referenced by any number of ProcBins from any
number of processes. The object contains a reference counter to keep track
of the number of references, so that it can be removed when the last
reference disappears.</p>
<p>All ProcBin objects in a process are part of a linked list, so that
the garbage collector can keep track of them and decrement the reference
counters in the binary when a ProcBin disappears.</p>
</section>
<section>
<marker id="heap_binary"></marker>
<title>Heap Binaries</title>
<p>Heap binaries are small binaries, up to 64 bytes, and are stored
directly on the process heap. They are copied when the process is
garbage-collected and when they are sent as a message. They do not
require any special handling by the garbage collector.</p>
</section>
<section>
<title>Sub Binaries</title>
<p>The reference objects <em>sub binaries</em> and
<em>match contexts</em> can reference part of
a refc binary or heap binary.</p>
<p><marker id="sub_binary"></marker>A <em>sub binary</em>
is created by <c>split_binary/2</c> and when
a binary is matched out in a binary pattern. A sub binary is a reference
into a part of another binary (refc or heap binary, but never into another
sub binary). Therefore, matching out a binary is relatively cheap because
the actual binary data is never copied.</p>
</section>
<section>
<title>Match Context</title>
<marker id="match_context"></marker>
<p>A <em>match context</em> is similar to a sub binary, but is
optimized for binary matching. For example, it contains a direct
pointer to the binary data. For each field that is matched out of
a binary, the position in the match context is incremented.</p>
<p>In R11B, a match context was only used during a binary matching
operation.</p>
<p>In R12B, the compiler tries to avoid generating code that
creates a sub binary, only to shortly afterwards create a new match
context and discard the sub binary. Instead of creating a sub binary,
the match context is kept.</p>
<p>The compiler can only do this optimization if it knows
that the match context will not be shared. If it would be shared, the
functional properties (also called referential transparency) of Erlang
would break.</p>
</section>
</section>
<section>
<title>Constructing Binaries</title>
<p>In R12B, appending to a binary or bitstring
is specially optimized by the <em>runtime system</em>:</p>
<code type="erl"><![CDATA[
<<Binary/binary, ...>>
<<Binary/bitstring, ...>>]]></code>
<p>As the runtime system handles the optimization (instead of
the compiler), there are very few circumstances in which the optimization
does not work.</p>
<p>To explain how it works, let us examine the following code line
by line:</p>
<code type="erl"><![CDATA[
Bin0 = <<0>>, %% 1
Bin1 = <<Bin0/binary,1,2,3>>, %% 2
Bin2 = <<Bin1/binary,4,5,6>>, %% 3
Bin3 = <<Bin2/binary,7,8,9>>, %% 4
Bin4 = <<Bin1/binary,17>>, %% 5 !!!
{Bin4,Bin3} %% 6]]></code>
<list type="bulleted">
<item>Line 1 (marked with the <c>%% 1</c> comment), assigns
a <seealso marker="#heap_binary">heap binary</seealso> to
the <c>Bin0</c> variable.</item>
<item>Line 2 is an append operation. As <c>Bin0</c>
has not been involved in an append operation,
a new <seealso marker="#refc_binary">refc binary</seealso>
is created and the contents of <c>Bin0</c> is copied
into it. The <em>ProcBin</em> part of the refc binary has
its size set to the size of the data stored in the binary, while
the binary object has extra space allocated.
The size of the binary object is either twice the
size of <c>Bin0</c> or 256, whichever is larger. In this case
it is 256.</item>
<item>Line 3 is more interesting.
<c>Bin1</c> <em>has</em> been used in an append operation,
and it has 255 bytes of unused storage at the end, so the 3 new
bytes are stored there.</item>
<item>Line 4. The same applies here. There are 252 bytes left,
so there is no problem storing another 3 bytes.</item>
<item>Line 5. Here, something <em>interesting</em> happens. Notice
that the result is not appended to the previous result in <c>Bin3</c>,
but to <c>Bin1</c>. It is expected that <c>Bin4</c> will be assigned
the value <c><<0,1,2,3,17>></c>. It is also expected that
<c>Bin3</c> will retain its value
(<c><<0,1,2,3,4,5,6,7,8,9>></c>).
Clearly, the runtime system cannot write byte <c>17</c> into the binary,
because that would change the value of <c>Bin3</c> to
<c><<0,1,2,3,4,17,6,7,8,9>></c>.</item>
</list>
<p>The runtime system sees that <c>Bin1</c> is the result
from a previous append operation (not from the latest append operation),
so it <em>copies</em> the contents of <c>Bin1</c> to a new binary,
reserve extra storage, and so on. (Here is not explained how the
runtime system can know that it is not allowed to write into <c>Bin1</c>;
it is left as an exercise to the curious reader to figure out how it is
done by reading the emulator sources, primarily <c>erl_bits.c</c>.)</p>
<section>
<title>Circumstances That Force Copying</title>
<p>The optimization of the binary append operation requires that
there is a <em>single</em> ProcBin and a <em>single reference</em> to the
ProcBin for the binary. The reason is that the binary object can be
moved (reallocated) during an append operation, and when that happens,
the pointer in the ProcBin must be updated. If there would be more than
one ProcBin pointing to the binary object, it would not be possible to
find and update all of them.</p>
<p>Therefore, certain operations on a binary mark it so that
any future append operation will be forced to copy the binary.
In most cases, the binary object will be shrunk at the same time
to reclaim the extra space allocated for growing.</p>
<p>When appending to a binary as follows, only the binary returned
from the latest append operation will support further cheap append
operations:</p>
<code type="erl"><![CDATA[
Bin = <<Bin0,...>>]]></code>
<p>In the code fragment in the beginning of this section,
appending to <c>Bin</c> will be cheap, while appending to <c>Bin0</c>
will force the creation of a new binary and copying of the contents
of <c>Bin0</c>.</p>
<p>If a binary is sent as a message to a process or port, the binary
will be shrunk and any further append operation will copy the binary
data into a new binary. For example, in the following code fragment
<c>Bin1</c> will be copied in the third line:</p>
<code type="erl"><![CDATA[
Bin1 = <<Bin0,...>>,
PortOrPid ! Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED
]]></code>
<p>The same happens if you insert a binary into an Ets
table, send it to a port using <c>erlang:port_command/2</c>, or
pass it to
<seealso marker="erts:erl_nif#enif_inspect_binary">enif_inspect_binary</seealso>
in a NIF.</p>
<p>Matching a binary will also cause it to shrink and the next append
operation will copy the binary data:</p>
<code type="erl"><![CDATA[
Bin1 = <<Bin0,...>>,
<<X,Y,Z,T/binary>> = Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED
]]></code>
<p>The reason is that a
<seealso marker="#match_context">match context</seealso>
contains a direct pointer to the binary data.</p>
<p>If a process simply keeps binaries (either in "loop data" or in the
process
dictionary), the garbage collector can eventually shrink the binaries.
If only one such binary is kept, it will not be shrunk. If the process
later appends to a binary that has been shrunk, the binary object will
be reallocated to make place for the data to be appended.</p>
</section>
</section>
<section>
<title>Matching Binaries</title>
<p>Let us revisit the example in the beginning of the previous section:</p>
<p><em>DO</em> (in R12B)</p>
<code type="erl"><![CDATA[
my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].]]></code>
<p>The first time <c>my_binary_to_list/1</c> is called,
a <seealso marker="#match_context">match context</seealso>
is created. The match context points to the first
byte of the binary. 1 byte is matched out and the match context
is updated to point to the second byte in the binary.</p>
<p>In R11B, at this point a
<seealso marker="#sub_binary">sub binary</seealso>
would be created. In R12B,
the compiler sees that there is no point in creating a sub binary,
because there will soon be a call to a function (in this case,
to <c>my_binary_to_list/1</c> itself) that immediately will
create a new match context and discard the sub binary.</p>
<p>Therefore, in R12B, <c>my_binary_to_list/1</c> calls itself
with the match context instead of with a sub binary. The instruction
that initializes the matching operation basically does nothing
when it sees that it was passed a match context instead of a binary.</p>
<p>When the end of the binary is reached and the second clause matches,
the match context will simply be discarded (removed in the next
garbage collection, as there is no longer any reference to it).</p>
<p>To summarize, <c>my_binary_to_list/1</c> in R12B only needs to create
<em>one</em> match context and no sub binaries. In R11B, if the binary
contains <em>N</em> bytes, <em>N+1</em> match contexts and <em>N</em>
sub binaries are created.</p>
<p>In R11B, the fastest way to match binaries is as follows:</p>
<p><em>DO NOT</em> (in R12B)</p>
<code type="erl"><![CDATA[
my_complicated_binary_to_list(Bin) ->
my_complicated_binary_to_list(Bin, 0).
my_complicated_binary_to_list(Bin, Skip) ->
case Bin of
<<_:Skip/binary,Byte,_/binary>> ->
[Byte|my_complicated_binary_to_list(Bin, Skip+1)];
<<_:Skip/binary>> ->
[]
end.]]></code>
<p>This function cleverly avoids building sub binaries, but it cannot
avoid building a match context in each recursion step.
Therefore, in both R11B and R12B,
<c>my_complicated_binary_to_list/1</c> builds <em>N+1</em> match
contexts. (In a future Erlang/OTP release, the compiler might be able
to generate code that reuses the match context.)</p>
<p>Returning to <c>my_binary_to_list/1</c>, notice that the match context
was discarded when the entire binary had been traversed. What happens if
the iteration stops before it has reached the end of the binary? Will
the optimization still work?</p>
<code type="erl"><![CDATA[
after_zero(<<0,T/binary>>) ->
T;
after_zero(<<_,T/binary>>) ->
after_zero(T);
after_zero(<<>>) ->
<<>>.
]]></code>
<p>Yes, it will. The compiler will remove the building of the sub binary in
the second clause:</p>
<code type="erl"><![CDATA[
...
after_zero(<<_,T/binary>>) ->
after_zero(T);
...]]></code>
<p>But it will generate code that builds a sub binary in the first clause:</p>
<code type="erl"><![CDATA[
after_zero(<<0,T/binary>>) ->
T;
...]]></code>
<p>Therefore, <c>after_zero/1</c> builds one match context and one sub binary
(assuming it is passed a binary that contains a zero byte).</p>
<p>Code like the following will also be optimized:</p>
<code type="erl"><![CDATA[
all_but_zeroes_to_list(Buffer, Acc, 0) ->
{lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).]]></code>
<p>The compiler removes building of sub binaries in the second and third
clauses, and it adds an instruction to the first clause that converts
<c>Buffer</c> from a match context to a sub binary (or do nothing if
<c>Buffer</c> is a binary already).</p>
<p>Before you begin to think that the compiler can optimize any binary
patterns, the following function cannot be optimized by the compiler
(currently, at least):</p>
<code type="erl"><![CDATA[
non_opt_eq([H|T1], <<H,T2/binary>>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]></code>
<p>It was mentioned earlier that the compiler can only delay creation of
sub binaries if it knows that the binary will not be shared. In this case,
the compiler cannot know.</p>
<p>Soon it is shown how to rewrite <c>non_opt_eq/2</c> so that the delayed
sub binary optimization can be applied, and more importantly, it is shown
how you can find out whether your code can be optimized.</p>
<section>
<title>Option bin_opt_info</title>
<p>Use the <c>bin_opt_info</c> option to have the compiler print a lot of
information about binary optimizations. It can be given either to the
compiler or <c>erlc</c>:</p>
<code type="erl"><![CDATA[
erlc +bin_opt_info Mod.erl]]></code>
<p>or passed through an environment variable:</p>
<code type="erl"><![CDATA[
export ERL_COMPILER_OPTIONS=bin_opt_info]]></code>
<p>Notice that the <c>bin_opt_info</c> is not meant to be a permanent
option added to your <c>Makefile</c>s, because all messages that it
generates cannot be eliminated. Therefore, passing the option through
the environment is in most cases the most practical approach.</p>
<p>The warnings look as follows:</p>
<code type="erl"><![CDATA[
./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned
./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed]]></code>
<p>To make it clearer exactly what code the warnings refer to, the
warnings in the following examples are inserted as comments
after the clause they refer to, for example:</p>
<code type="erl"><![CDATA[
after_zero(<<0,T/binary>>) ->
%% NOT OPTIMIZED: sub binary is used or returned
T;
after_zero(<<_,T/binary>>) ->
%% OPTIMIZED: creation of sub binary delayed
after_zero(T);
after_zero(<<>>) ->
<<>>.]]></code>
<p>The warning for the first clause says that the creation of a sub
binary cannot be delayed, because it will be returned.
The warning for the second clause says that a sub binary will not be
created (yet).</p>
<p>Let us revisit the earlier example of the code that could not
be optimized and find out why:</p>
<code type="erl"><![CDATA[
non_opt_eq([H|T1], <<H,T2/binary>>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
%% NOT OPTIMIZED: called function non_opt_eq/2 does not
%% begin with a suitable binary matching instruction
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.]]></code>
<p>The compiler emitted two warnings. The <c>INFO</c> warning refers
to the function <c>non_opt_eq/2</c> as a callee, indicating that any
function that call <c>non_opt_eq/2</c> cannot make delayed sub binary
optimization. There is also a suggestion to change argument order.
The second warning (that happens to refer to the same line) refers to
the construction of the sub binary itself.</p>
<p>Soon another example will show the difference between the
<c>INFO</c> and <c>NOT OPTIMIZED</c> warnings somewhat clearer, but
let us first follow the suggestion to change argument order:</p>
<code type="erl"><![CDATA[
opt_eq(<<H,T1/binary>>, [H|T2]) ->
%% OPTIMIZED: creation of sub binary delayed
opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
false;
opt_eq(<<>>, []) ->
true.]]></code>
<p>The compiler gives a warning for the following code fragment:</p>
<code type="erl"><![CDATA[
match_body([0|_], <<H,_/binary>>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
done;
...]]></code>
<p>The warning means that <em>if</em> there is a call to <c>match_body/2</c>
(from another clause in <c>match_body/2</c> or another function), the
delayed sub binary optimization will not be possible. More warnings will
occur for any place where a sub binary is matched out at the end of and
passed as the second argument to <c>match_body/2</c>, for example:</p>
<code type="erl"><![CDATA[
match_head(List, <<_:10,Data/binary>>) ->
%% NOT OPTIMIZED: called function match_body/2 does not
%% begin with a suitable binary matching instruction
match_body(List, Data).]]></code>
</section>
<section>
<title>Unused Variables</title>
<p>The compiler figures out if a variable is unused. The same
code is generated for each of the following functions:</p>
<code type="erl"><![CDATA[
count1(<<_,T/binary>>, Count) -> count1(T, Count+1);
count1(<<>>, Count) -> Count.
count2(<<H,T/binary>>, Count) -> count2(T, Count+1);
count2(<<>>, Count) -> Count.
count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
count3(<<>>, Count) -> Count.]]></code>
<p>In each iteration, the first 8 bits in the binary will be skipped,
not matched out.</p>
</section>
</section>
</chapter>
|