Age | Commit message (Collapse) | Author |
|
Remove the now unused beam_utils:is_not_used/3 and
beam_utils:is_killed/3 functions and friends.
Starting out as simple functions a long time ago, those functions have
grown and grown to support more optimizations. The number of bugs
found and fixed in beam_utils has also grown over time.
|
|
Minor cleanups and bug fixes of the compiler
|
|
This has been superseded by bs_get_tail/3. Note that it is NOT
removed from the emulator or beam_disasm, as old modules are still
legal.
|
|
The only caller of bif_to_test/3 is beam_ssa_codegen.
|
|
|
|
Continuing the simplification of beam_flatten, move the optimization
that eliminates a test_heap instruction following a binary construction
by incorporating the allocation of the heap space into the bs_init*
instruction itself.
This change does not change the generated code in any way.
Also remove beam_utils:combine_heap_needs/2, because beam_flatten
was the last user of it.
|
|
v3_codegen is replaced by three new passes:
* beam_kernel_to_ssa which translates the Kernel Erlang format
to a new SSA-based intermediate format.
* beam_ssa_pre_codegen which prepares the SSA-based format
for code generation, including register allocation. Registers
are allocated using the linear scan algorithm.
* beam_ssa_codegen which generates BEAM assembly code from the
SSA-based format.
It easier and more effective to optimize the SSA-based format before X
and Y registers have been assigned. The current optimization passes
constantly have to make sure no "holes" in the X register assignments
are created (that is, that no X register becomes undefined that an
allocation instruction depends on).
This commit also introduces the following optimizations:
* Replacing of tuple matching of records with the is_tagged_tuple
instruction. (Replacing beam_record.)
* Sinking of get_tuple_element instructions to just before the first
use of the extracted values. As well as potentially avoiding
extracting tuple elements when they are not actually used on all
executions paths, this optimization could also reduce the number
values that will need to be stored in Y registers. (Similar to
beam_reorder, but more effective.)
* Live optimizations, removing the definition of a variable that is
not subsequently used (provided that the operation has no side
effects), as well strength reduction of binary matching by replacing
the extraction of value from a binary with a skip instruction. (Used
to be done by beam_block, beam_utils, and v3_codegen.)
* Removal of redundant bs_restore2 instructions. (Formerly done
by beam_bs.)
* Type-based optimizations across branches. More effective than
the old beam_type pass that only did type-based optimizations in
basic blocks.
* Optimization of floating point instructions. (Formerly done
by beam_type.)
* Optimization of receive statements to introduce recv_mark and
recv_set instructions. More effective with far fewer restrictions
on what instructions are allowed between creating the reference
and entering the receive statement.
* Common subexpression elimination. (Formerly done by beam_block.)
|
|
As a preparation for replacing v3_codegen with a new code generator,
remove unsafe optimization passes. Especially the older compiler
passes have implicit assumptions about how the code is generated.
Remove the optimizations in beam_block (keep the code that creates
blocks) because they are unsafe. beam_block also calls
beam_utils:live_opt/1, which is unsafe.
Remove beam_type because it calls beam_utils:live_opt/1, and also
because it recalculates the number of heaps words and number of live
registers in allocation instructions, thus potentially hiding bugs in
other passes.
Remove beam_receive because it is unsafe.
Remove beam_record because it is the only remaining user
of beam_utils:anno_defs/1.
Remove beam_reorder because it makes much more sense to run it
as an early SSA-based optimization pass.
Remove the now unused functions in beam_utils:
anno_def/1
delete_annos/1
is_killed_block/2
live_opt/1
usage/3
Note that the following test cases will fail because of the
removed optimizations:
compile_SUITE:optimized_guards/1
compile_SUITE:bc_options/1
receive_SUITE:ref_opt/1
|
|
|
|
This will enable more optimizations.
|
|
The compiler would crash when compiling a function with two
receive statements.
https://bugs.erlang.org/browse/ERL-703
|
|
maint-21
* bjorn/compiler/fix-beam_jump-crash/ERL-660/OTP-15166:
Eliminate a crash in the beam_jump pass
|
|
https://bugs.erlang.org/browse/ERL-660
|
|
The compiler would crash when compiling code such as:
serialize(#{tag := value, id := Id, domain := Domain}) ->
[case Id of
nil ->
error(id({required, id}));
_ ->
<<10, 1:16/signed, Id:16/signed>>
end,
case Domain of
nil ->
error(id({required, domain}));
_ ->
<<8, 2:16/signed, Domain:32/signed>>
end].
The crash would look like this:
Function: serialize/1
t.erl: internal error in block2;
crash reason: {badmatch,false}
in function beam_utils:live_opt/4 (beam_utils.erl, line 861)
in call from beam_utils:live_opt/1 (beam_utils.erl, line 285)
in call from beam_block:function/2 (beam_block.erl, line 47)
in call from beam_block:'-module/2-lc$^0/1-0-'/2 (beam_block.erl, line 33)
in call from beam_block:'-module/2-lc$^0/1-0-'/2 (beam_block.erl, line 33)
in call from beam_block:module/2 (beam_block.erl, line 33)
in call from compile:block2/2 (compile.erl, line 1358)
in call from compile:'-internal_comp/5-anonymous-1-'/3 (compile.erl, line 349)
The reason for the crash is an assertion failure caused by a previous
unsafe optimization. Here is the code before the unsafe optimization:
.
.
.
{bs_init2,{f,0},7,0,0,{field_flags,[]},{x,1}}.
{bs_put_string,3,{string,[8,0,2]}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[signed,big]},{y,1}}.
{move,{x,1},{x,0}}.
{test_heap,4,1}.
.
.
.
beam_block:move_allocate/1 moved up the test_heap/2 instruction past the
move/2 instruction, adjusting the number of live registers at the same
time:
.
.
.
{bs_init2,{f,0},7,0,0,{field_flags,[]},{x,1}}.
%% Only x1 is live now.
{bs_put_string,3,{string,[8,0,2]}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[signed,big]},{y,1}}.
{test_heap,4,2}. %Unsafe. x0 is dead.
{move,{x,1},{x,0}}.
.
.
.
This optimization is unsafe because the bs_init2 instruction killed
x0.
The bug is in beam_utils:anno_defs/1, which adds annotations indicating
the registers that are defined at the beginning of each block. The
annotation before the move/2 instruction incorrectly indicated that
x0 was live.
https://bugs.erlang.org/browse/ERL-650
https://github.com/elixir-lang/elixir/issues/7782
|
|
|
|
beam_utils:is_killed/3 could incorrectly indicate that a
register was killed.
The previous fix is 5da6b91ecab6c.
|
|
beam_record would make an unsafe optimization for the
not_used_p/4 function added to beam_utils_SUITE in this
commit. The bug is in beam_utils, which would falsely
report that {x,4} was unused when it in fact was used.
The bug was in the function not_used/1. The purpose of
not_used/1 is to return a 'not_used' result unless the
actual result is 'used'. Unfortunately it was not
implemented in that way. It would let a 'transparent'
result slip through, which the caller in this case would
convert to 'killed' (because the register was killed on
all other paths).
Reported-by: Richard Carlsson
|
|
The missing support for renumbering labels in recv_mark
and recv_set did not seem to cause any problems, probably because
the insructions are introduced late and their labels would keep
their numbers. But it there will definitely be a problem if the
recv_mark and recv_set instructions would be introduced much earlier.
|
|
live_opt_block/4 could overestimate the number of live
registers for a GC BIF and trigger an assertion. This does not
seem to be a problem when generating code using v3_codegen,
but only when using a new experimental code generator, therefore
there is no need include this correction in maint.
|
|
is_killed/3 and is_killed_at/3 could return 'true' even if the
register was referenced by an allocation instruction. Somehow, that
does not seem to have caused any problems yet.
|
|
The more aggressive optimizations of 'allocate_zero' introduced
in cb6fc15c35c7e could produce unsafe code such as the following:
{allocate,0,1}.
{bif,element,{f,0},[{integer,1},{x,0}],{x,0}}.
The code is not safe because if element/2 fails, the runtime
system may scan the stack and find garbage that looks like a
catch tag, and would most probably crash.
Fix the problem by making beam_utils:is_killed/3 be more conservative
when asked whether a Y register will be killed.
Also fix an unsafe move upwards of an allocation instruction
in beam_block.
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|
|
Run beam_block a second time
|
|
beam_utils:live_opt/1 is currently only run early (from
beam_block). Prepare it to be run after beam_split when
instructions with failure labels have been taken out of
blocks.
While we are it, also improve check_liveness/3. That will
improve the optimizations in beam_record (replacing tuple
matching instructions with an is_tagged_tuple instruction).
|
|
Since the select_val instruction never transfer directly to the next
instruction, the incoming live registers should be ignored. This
bug have not caused any problems yet, but it will in the future
if we are to run the liveness optimizations again after
the optimizations in beam_dead and beam_jump.
|
|
The annotations in the optimizing passes currently looks like this:
{'%live',NumRegistersUsed,RegistersUsedBitmap}
{'%def',RegistersDefinedBitmap}
(NumRegistersUsed is no longer used.)
When I attempted to extend some optimizations, I found that I had to
add additional clauses to tolerate/handle both types of
annotations. That problem would only get worse if any more annotations
are added in the future.
To simplify annotation handling, this commit wraps both types of
annotations in a {'%anno',_} tuple:
{'%anno',{used,RegistersUsedBitmap}}
{'%anno',{def,RegistersDefinedBitmap}}
The '%live' annotation has been renamed to 'used' to make it somewhat
clearer what it means, and the unused NumRegistersUsed part of the
old annotation has been removed.
Alternatives considered: My first attempt was to wrap the annotation
in a 'set' tuple so that there would only be 'set' tuples in a block.
For example:
{set,[],[],{anno,{live,RegistersUsedBitmap}}}
It was not as convenient as expected. Annotations often need to be
handled specially from other instructions in a block. When they are
wrapped in a 'set' tuple, they can very easily be handled incorrectly
or passed on to the next pass. That causes subtle errors or worse
code, and it can be difficult to debug.
Therefore, my conclusion is that annotations should be distinct from
other instructions, to make it obvious when one have missed to handle
an annotation.
|
|
In 21dd6e55877832, beam_utils:combine_heap_needs/2 stopped
wrapping an allocation list in an {alloc,...} tuple. That was
not noticed because the faulty heap need created in beam_block
was discarded by beam_type.
|
|
beam_utils:is_killed/3 could incorrectly indicate that a
register was killed, when in fact it was referenced by
an instruction that did a GC.
|
|
Handle a few more instructions in beam_utils. That will allow
beam_reorder to reorder more instructions, delaying get_tuple_element
instructions and reducing register shuffling in receive clauses.
|
|
Delay creation of stack frames
|
|
To avoid having to call both is_killed/3 and is_not_used/3,
add usage/3 to answer both questions in one call.
|
|
Add beam_utils:anno_defs/1 which will add an annotation to the
beginning of each block indicating which X registers that are
defined. Having that information can improve some optimizations.
|
|
|
|
There are four uncovered lines in combine_heap_needs/2 and
combine_alloc_lists/2. There is no way to reach starting from
Erlang source code using the standard compiler. However, they
can be reached starting from BEAM assembly code, so we don't
want to remove them.
We could add a test case that covers the lines using assembly
code, but an easier solution is to rewrite the code in a more
generic way using sofs so that the code can be covered with
existing test cases.
|
|
01835845579e9 fixed some problems, but introduced a bug where
is_not_used/3 would report that a register was not used when it
in fact was.
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
beam_utils:bif_to_test/3 is supposed to never put a literal
operand as the first operand in is_eq_exact or is_ne_exact,
but 'nil' was not recognized as a literal.
|
|
If a try/catch is used to ignore the potential exception caused by some
expression with a side effect such as:
try
side_effect()
catch _Class:_Reason ->
ok
end,
.
.
.
the generated code will be wasteful:
try YReg TryLabel
%% call side_effect() here
try_end Yreg
jump GoodLabel
TryLabel:
try_case YReg
%% try_case has set up registers as follows:
%% x(0) -> error | exit | throw
%% x(1) -> reason
%% x(2) -> raw stack trace data
GoodLabel:
%% This code does not use any x registers.
There will be two code paths that both end up at GoodLabel, and
the try_case instruction will set up registers that are never used.
We can do better by replacing try_case with try_end to obtain this
code:
try YReg TryLabel
%% call side_effect() here
try_end Yreg
jump GoodLabel
TryLabel:
try_end YReg
GoodLabel:
The jump optimizer (beam_jump) can further optimize this code
like this:
try YReg TryLabel
%% call side_effect() here
TryLabel:
try_end YReg
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
|
|
'john/compiler/fail-labels-in-blocks-otp-18/ERIERL-48/OTP-14522' into maint
* john/compiler/fail-labels-in-blocks-otp-18/ERIERL-48/OTP-14522:
compiler: Fix live regs update on allocate in validator
Take fail labels into account when determining liveness in block ops
Conflicts:
lib/compiler/src/beam_utils.erl
|
|
|
|
|
|
|
|
The beam_type may pass move and recalculates test_heap instructions.
The number of live registers are not always the lowest. Minimize the
number of registers by running beam_utils:live_opt/1 one more time.
|
|
The guard optimizations in v3_kernel has removed the need for
beam_bool.
|
|
* maint:
Update primary bootstrap
beam_block: Avoid unsafe inclusion of get_map_elements in blocks
|
|
c2035ebb8b restricted the get_map_elements instruction so that it
could only occur at the beginning of a block. It turns out that
including it anywhere in a block is unsafe.
Therefore, never put get_map_elements instruction in blocks.
(Also remove the beam_utils:join_even/2 function since it is no
longer used.)
ERL-266
|
|
When beam_utils was first written, it did not have the functions
for testing whether a register was not used. Those were added
later, in sort of a hacky way.
Also, is_killed*() and is_not_used*() for Y registers would
return the same answer. Fix that to make the API more consistent
(an Y register can only be killed by a deallocate/1 instruction).
We will need to change beam_trim to call beam_utils:is_not_used/3
instead of beam_utils:is_killed/3.
|