Age | Commit message (Collapse) | Author |
|
beam_record would make an unsafe optimization for the
not_used_p/4 function added to beam_utils_SUITE in this
commit. The bug is in beam_utils, which would falsely
report that {x,4} was unused when it in fact was used.
The bug was in the function not_used/1. The purpose of
not_used/1 is to return a 'not_used' result unless the
actual result is 'used'. Unfortunately it was not
implemented in that way. It would let a 'transparent'
result slip through, which the caller in this case would
convert to 'killed' (because the register was killed on
all other paths).
Reported-by: Richard Carlsson
|
|
When a generator in a list comprehension was given some
other term than a list, the wrong line could be pointed
out in the exception. Here is an example:
bad_generator() ->
[I || %%This line would be pointed out.
I <- not_a_list].
https://bugs.erlang.org/browse/ERL-572
|
|
A literal negative size in binary construction would cause a crash.
|
|
Refactor and fix minor bugs in beam_type
|
|
|
|
Every catch or try/catch must use a lower Y register number than any
enclosing catch or try/catch. That will ensure that when the stack
is scanned when an exception occurs, the innermost try/catch tag is
found first.
|
|
Eliminate get_list/3 internally in the compiler
|
|
Fix incorrect handling of floating point instructions
|
|
* maint:
Fix incorrect type interference of integer ranges
Conflicts:
lib/compiler/src/beam_type.erl
|
|
1a029efd1ad47f started to run the beam_block pass a second time.
Since it is run after introduction of the optimized floating point
instructions, it must handle those instructions correctly.
In particular, it must be careful when hoisting allocation
instructions. For example, the following code:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{fmove,{fr,2},{x,0}}.
{allocate_zero,1,4}.
must not be rewritten to:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{allocate_zero,1,4}.
{fmove,{fr,2},{x,0}}.
because beam_validator will not consider it safe. (The code may
actually be safe depending on what the code between the two allocation
instructions do.)
https://bugs.erlang.org/browse/ERL-555
|
|
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
misc_SUITE:integer_encoding/1 was written to make sure
that big integers were encoding correctly in a reasonable
amount of time. Now that beam_asm will encode big integers
as literals, we can reduce the scope of integer_encode/1.
That will make it significantly faster, especially when
cover is running.
|
|
Do local common sub expression elimination (CSE)
|
|
Optimize away unnecessary test_unit instructions that verify that
binaries are byte-aligned. In a tight loop, eliminating an
instruction can have a small but measurable improvement of the
execution time.
|
|
Extend an existing optimization in beam_dead to avoid
creating a match context when matching an empty binary.
|
|
Eliminate repeated evaluation of guard BIFs and building of cons cells
in blocks. This optimization is applicable in more places than might be
expected, because code generation for binaries and record can generate
common sub expressions not visible in the original source code.
For example, consider this function:
make_binary(Term) ->
Bin = term_to_binary(Term),
Size = byte_size(Bin),
<<Size:32,Bin/binary>>.
The compiler inserts a call to byte_size/2 to calculate the size of
the binary being built:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}. %Present in original code.
{line,...}.
{gc_bif,byte_size,{f,0},2,[{x,0}],{x,2}}. %Inserted by compiler.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Common sub expression elimination (CSE) eliminates the second call to
byte_size/2:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}.
{move,{x,1},{x,2}}.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Note: A possible future optimization would be to include binary
construction instructions in blocks. If that is done, the
{move,{x,1},{x,2}} instruction could also be eliminated.
|
|
Make sure that there is the correct number of put/1 instructions
following put_tuple/2. Also make it illegal to reference the
register for the tuple being built in a put/1 instruction.
That is, beam_validator will now issue a diagnostice for the the
following code:
{put_tuple,1,{x,0}}.
{put,{x,0}}.
|
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|
|
Argument order can prevent the delayed sub binary creation.
Here is an example directly from the Efficiency Guide:
non_opt_eq([H|T1], <<H,T2/binary>>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.
When compiling with the bin_opt_info option, there will be a
suggestion to change the argument order.
It turns out sys_core_bsm can itself change the order, not the
order of the arguments of themselves, but the order in which
the arguments are matched. Here is how it can be rewritten in
pseudo Core Erlang code:
non_opt_eq(Arg1, Arg2) ->
case < Arg2,Arg1 > of
< <<H1,T2/binary>>, [H2|T1] > when H1 =:= H2 ->
non_opt_eq(T1, T2);
< <<_,T2/binary>ffff>, [_|T1] > ->
false;
< <<>>, [] >> ->
true
end.
When rewritten like this, the bs_start_match2 instruction will be
the first instruction in the function and it will be possible to
store the match context in the same register as the binary
({x,1} in this case) and to delay the creation of sub binaries.
The switching of matching order also enables many other simplifications
in sys_core_bsm, since there is no longer any need to pass the position
of the pattern as an argument.
We will update the Efficiency Guide in a separate branch before the
release of OTP 21.
|
|
|
|
Add some tests cases written when attempting some new optimizations
that turned out to be unsafe.
|
|
Consider a 'case' that exports variables and whose return
value is ignored:
foo(N) ->
case N of
1 ->
Res = one;
2 ->
Res = two
end,
{ok,Res}.
That code will be translated to the following Core Erlang code:
'foo'/1 =
fun (_@c0) ->
let <_@c5,Res> =
case _@c0 of
<1> when 'true' ->
<'one','one'>
<2> when 'true' ->
<'two','two'>
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The exported variables has been rewritten to explicit return
values. Note that the original return value from the 'case' is bound to
the variable _@c5, which is unused.
The corresponding BEAM assembly code looks like this:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,1}}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,1}}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,2}.
{put_tuple,2,{x,0}}.
{put,{atom,ok}}.
{put,{x,1}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
Because of the test_heap instruction following label 5, the assignment
to {x,0} cannot be optimized away by the passes that optimize BEAM assembly
code.
Refactor the optimizations of 'let' in sys_core_fold to eliminate the
unused variable. Thus:
'foo'/1 =
fun (_@c0) ->
let <Res> =
case _@c0 of
<1> when 'true' ->
'one'
<2> when 'true' ->
'two'
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The resulting BEAM code will look like:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,1}.
{put_tuple,2,{x,1}}.
{put,{atom,ok}}.
{put,{x,0}}.
{move,{x,1},{x,0}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
|
|
|
|
The type optimizations for is_record and test_arity checked whether
the arity was equal to the size stored in the type information,
which is incorrect since said size is the *minimum* size of the
tuple (as determined by previous instructions) and not its exact
size.
A future patch to the 'master' branch will restore these
optimizations in a safe manner.
|
|
Delay creation of stack frames
|
|
|
|
|
|
01835845579e9 fixed some problems, but introduced a bug where
is_not_used/3 would report that a register was not used when it
in fact was.
|
|
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
|
|
|
|
We used to not care about the number of values returned from the
'after infinity' clause in a receive (because it could never be
executed). It is time to start caring because this will cause problem
when we will soon start to do some more aggressive optimizizations.
|
|
|
|
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
|
|
f9a323d10a9f5d added consistent operand order for equality
comparisons. As a result, beam_dead:turn_op/1 is no longer covered.
We must keep the uncovered lines in beam_dead to ensure that
beam_dead can handle BEAM assembly code from another source than
v3_codegen that might not follow the operand order convention.
The only way to cover the lines is to use BEAM assembly in
the test case.
|
|
|
|
The uncovered line was added in 6753bbcc3fdb0.
|
|
Conflicts:
lib/compiler/src/beam_listing.erl
|
|
* lukas/compiler/add_to_dis/OTP-14784:
compiler: Add +to_dis option that dumps loaded asm
|
|
|
|
* maint:
Recognize 'deterministic' when given in a -compile() attribute
Conflicts:
lib/compiler/src/beam_asm.erl
|
|
* bjorn/cuddle-with-tests:
Skip compile_SUITE:pre_load_check/1 when code is native-compiled
|
|
The compiler option 'deterministic' was only recognized when given
as an option to the compiler, not when it was specified in a
-compile() attribute in the source file.
https://bugs.erlang.org/browse/ERL-498
|
|
The v3_life pass does not do enough to be worth being its own
pass. Essentially it does two things:
* Calculates life-time information starting from the annotations
that v3_kernel provides. That part can be moved into v3_codegen.
* Rewrites the Kernel Erlang records to similar plain tuples
(for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}).
That rewriting is not needed and can be eliminated.
|
|
Tracing won't work on modules that are native-compiled.
|
|
* maint:
Fix incorrect internal consistency failure for binary matching code
|