Age | Commit message (Collapse) | Author |
|
'bjorng_ghub/bjorn/compiler/fix-beam_ssa_dead-crash/ERL-956/OTP-15848' into maint-22
* bjorng_ghub/bjorn/compiler/fix-beam_ssa_dead-crash/ERL-956/OTP-15848:
Eliminate crash in the beam_ssa_dead compiler pass
|
|
into maint-22
* bjorn/compiler/fix-unloadable-code-patch/ERL-955/OTP-15846:
Fix loading of Core Erlang code for extracting a map element
|
|
* bjorn/compiler/fix-beam_except/ERL-954/OTP-15839:
Fix compiler crash in beam_except
|
|
* john/compiler/list_append_type/OTP-15841:
compiler: Fix broken type for erlang:'++'/2
|
|
* bjorn/compiler/fix-receive-patch/ERL-950/OTP-15832:
Eliminate compiler crash when compiling complex receive statements
|
|
* maint:
Eliminate crash in the beam_ssa_dead compiler pass
|
|
bjorng/bjorn/compiler/fix-beam_ssa_dead-crash/ERL-956/OTP-15848
Eliminate crash in the beam_ssa_dead compiler pass
|
|
|
|
|
|
A repeated test could be optimized away. Example:
bar(A) ->
if is_bitstring(A) ->
if is_binary(A) ->
binary;
true ->
bitstring
end;
true ->
other
end.
In the example, the `is_binary/1` test would be optimized away,
basically turning the example into:
bar(A) ->
if is_bitstring(A) ->
bitstring;
true ->
other
end.
Thanks user Marcus Kruse in the Elixir forum for noticing this bug.
|
|
The compiler could crash in the beam_ssa_dead pass while compiling
complex nested `case` expressions. See the added test case for an
example and explanation.
https://bugs.erlang.org/browse/ERL-956
|
|
* maint:
Fix loading of Core Erlang code for extracting a map element
Fix unsafe optimizations where guard tests could be removed
|
|
into maint
* bjorn/compiler/fix-unloadable-code-patch/ERL-955/OTP-15846:
Fix loading of Core Erlang code for extracting a map element
|
|
* bjorn/compiler/fix-beam_ssa_dead-patch/OTP-15845:
Fix unsafe optimizations where guard tests could be removed
|
|
* maint:
Fix compiler crash in beam_except
|
|
* bjorn/compiler/fix-beam_except/ERL-954/OTP-15839:
Fix compiler crash in beam_except
|
|
The compiler would crash in `beam_except` while compiling this
function:
bar(Req) ->
ok = case Req of
"POST" -> {error, <<"BAD METHOD ", Req/binary>>, Req};
_ -> ok
end.
https://bugs.erlang.org/browse/ERL-954
|
|
|
|
The following Core Erlang code could not be loaded:
'f'/1 = fun (_1) ->
case <_1> of
<~{'foo':='foo'}~> when 'true' ->
_1
end
Loading would fail with the following message:
beam/beam_load.c(2314): Error loading function example:f/1: op i_get_map_element_hash p x a u x:
no specific operation found
https://bugs.erlang.org/browse/ERL-955
|
|
A repeated test could be optimized away. Example:
bar(A) ->
if is_bitstring(A) ->
if is_binary(A) ->
binary;
true ->
bitstring
end;
true ->
other
end.
In the example, the `is_binary/1` test would be optimized away,
basically turning the example into:
bar(A) ->
if is_bitstring(A) ->
bitstring;
true ->
other
end.
Thanks user Marcus Kruse in the Elixir forum for noticing this bug.
|
|
* maint:
Eliminate compiler crash when compiling complex receive statements
|
|
* bjorn/compiler/fix-receive-patch/ERL-950/OTP-15832:
Eliminate compiler crash when compiling complex receive statements
|
|
* maint:
Fix non-terminating compilation
Fix compiler crash when funs were matched
|
|
* bjorn/compiler/fix-freeze/ERL-948/OTP-15828:
Fix non-terminating compilation
|
|
Improve optimization of redundant tests
|
|
Make the swap instruction known to the compiler
|
|
BEAM has had a `swap` instruction for several releases, but it was not
known to the compiler. The loader would translate a sequence of three
`move` instructions to the `swap` instructions, but only when it was
possible to determine that it would be safe.
By making `swap` known to the compiler, it can be applied in more
situations since it is easier for the compiler than for the loader
to ensure that the usage is safe, and the loader shenanigans can be
eliminated.
|
|
Certain complex receive statements would result in an internal
compiler failure. That would happen when the compiler would fail
to find the common exit block following a receive. See the added
test case for an example.
https://bugs.erlang.org/browse/ERL-950
|
|
The `beam_ssa_dead` pass is supposed to eliminate tests that are
determined to be redundant based on the outcome of a previous test.
For example, in the following example that repeats a guard test,
the second clause can never be executed:
foo(A) when A >= 42 -> one;
foo(A) when A >= 42 -> two;
foo(_) -> three.
`beam_ssa_dead` should have eliminated the second clause, but
didn't:
{test,is_ge,{f,5},[{x,0},{integer,42}]}.
{move,{atom,one},{x,0}}.
return.
{label,5}.
{test,is_ge,{f,6},[{x,0},{integer,42}]}.
{move,{atom,two},{x,0}}.
return.
{label,6}.
{move,{atom,three},{x,0}}.
return.
Correct the optimization of four different combinations of relational
operations that were too conservate. To ensure the correctness of the
optimization, also add an exahaustive test of all combinations of
relational operations with one variable and one literal. (Also remove
the weak and now redundant coverage tests in `beam_ssa_SUITE`.) With
this correction, the following code will be generated for the example:
{test,is_ge,{f,5},[{x,0},{integer,42}]}.
{move,{atom,one},{x,0}}.
return.
{label,5}.
{move,{atom,three},{x,0}}.
return.
Thanks to Dániel Szoboszlay (@dszoboszlay), whose talk at Code BEAM
STO 2019 made me aware of this missed opportunity for optimization.
|
|
The compiler would not terminate while compiling the following code:
foo(<<N:32>>, Tuple, NewValue) ->
_ = element(N, Tuple),
setelement(N, Tuple, NewValue).
The type analysis pass would attempt to construct a huge list when
attempting analyse the type of `Tuple` after the call to
`setelement/3`.
https://bugs.erlang.org/browse/ERL-948
|
|
The beam_except pass rewrites certain calls `erlang:error/{1,2}` to
use specialized instructions for common exceptions such as
`{badmatch,Term}`.
Move this optimization to `beam_ssa_pre_codegen` and `beam_ssa_codegen`.
The main reason for this change is that optimization passes operating
on SSA code are easier to maintain than optimization passes working
on BEAM code.
|
|
Code such as the following would crash the compiler in OTP 22:
[some_atom = fun some_function/1]
The reason is that the fun would be copied (used both in the match
operation and as a value in the list), and the copy of the fun would
create two wrapper functions with the same name for calling
some_function/1. In OTP 21, the duplicate functions happened not to
cause any harm (one of the wrappers functions would be unused and
ultimately be removed by beam_clean). In OTP 22, the new beam_ssa_type
pass would be confused by the multiple definitions of the wrapper
function.
|
|
* john/compiler/fix-missing-match-reposition/ERL-923:
compiler: Propagate match context position on fail path
|
|
|
|
|
|
* bjorn/compiler/cuddle-with-tests:
Verify the highest opcode for the r21 test suites
Add test_lib:highest_opcode/1
sys_core_fold: Simplify case_expand_var/2
beam_validator: Remove uncovered lines in lists_mod_return_type/3
Cover return type determination of lists functions
|
|
Validation could fail when a function that never returned was used
in a try block (see attached test case). It's possible to solve
this without disabling the optimization as the generated code is
sound, but I'm not comfortable making such a large change this
close to the OTP 22 release.
|
|
|
|
|
|
|
|
|
|
Optimize tail-recursive calls of BIFs
OTP-15674
|
|
BEAM currently does not call BIFs at the end of a function in a
tail-recursive way. That is, when calling a BIF at the end of a
function, the BIF is first called, and then the stack frame
is deallocated, and then control is transferred to the caller.
If there is no stack frame when a BIF is called in the tail position,
the loader will emit a sequence of three instructions: first an
instruction that allocates a stack frame and saves the continuation
pointer (`allocate`), then an instruction that calls the BIF
(`call_bif`), and lastly an instruction that deallocates the stack
frame and returns to the caller (`deallocate_return`).
The old compiler would essentially allocate a stack frame for each
clause in a function, so it would not be that common that a BIF was
called in the tail position when there was no stack frame, so the
three-instruction sequence was deemed acceptable.
The new compiler only allocates stack frames when truly needed, so
the three-instruction BIF call sequence has become much more common.
This commit introduces a new `call_bif_only` instruction so that only
one instruction will be needed when calling a BIF in the tail position
when there is no stack frame. This instruction is also used when there
is a stack frame to make it possible to deallocate the stack frame
**before** calling the BIF, which may make a subsequent garbage
collection at the end of the BIF call cheaper (copying less garbage).
The one downside of this change is that the function that called the
BIF will not be included in the stack backtrace (similar to how a
tail-recursive call to an Erlang function will not be included in the
backtrace).
That was the quick summary of the commit. Here comes a detailed look
at how BIF calls are translated by the loader. The first example is a
function that calls `setelement/3` in the tail position:
update_no_stackframe(X) ->
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_no_stackframe, 1, 12}.
{label,11}.
{line,[...]}.
{func_info,{atom,t},{atom,update_no_stackframe},1}.
{label,12}.
{move,{x,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
Because there is no stack frame, the `call_ext_only` instruction will
be used to call `setelement/3`:
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
The loader will transform this instruction to a three-instruction
sequence:
0000000020BD8130: allocate_tt 0 3
0000000020BD8138: call_bif_e erlang:setelement/3
0000000020BD8148: deallocate_return_Q 0
Using the `call_bif_only` instruction introduced in this commit,
only one instruction is needed:
000000005DC377F0: call_bif_only_e erlang:setelement/3
`call_bif_only` calls the BIF and returns to the caller.
Now let's look at a function that already has a stack frame when
`setelement/3` is called:
update_with_stackframe(X) ->
foobar(X),
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_with_stackframe, 1, 14}.
{label,13}.
{line,[...]}.
{func_info,{atom,t},{atom,update_with_stackframe},1}.
{label,14}.
{allocate,1,1}.
{move,{x,0},{y,0}}.
{line,[...]}.
{call,1,{f,16}}.
{move,{y,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Since there is a stack frame, the `call_ext_last` instruction will be used
to deallocate the stack frame and call the function:
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Before this commit, the loader would translate this instruction to:
0000000020BD81B8: call_bif_e erlang:setelement/3
0000000020BD81C8: deallocate_return_Q 1
That is, the BIF is called before deallocating the stack frame and returning
to the calling function.
After this commit, the loader will translate the `call_ext_last` like this:
000000005DC37868: deallocate_Q 1
000000005DC37870: call_bif_only_e erlang:setelement/3
There are still two instructions, but now the stack frame will be
deallocated before calling the BIF, which could make the potential
garbage collection after the BIF call slightly more efficient (copying
less garbage).
We could have introduced a `call_bif_last` instruction, but the code
for calling a BIF is relatively large and there does not seem be a
practical way to share the code between `call_bif` and `call_bif_only`
(since the difference is at the end, after the BIF call). Therefore,
we did not want to clone the BIF calling code yet another time to
make a `call_bif_last` instruction.
|
|
For reasons better explained in the source code, ssa_opt_float
skips optimizing inside guards but it failed to do so
consistently; while the pass never processed guard blocks, it was
still possible to erroneously defer error checking to a guard
block, crashing the compiler once it realized its state was
invalid.
|
|
Type subtraction never resulted in the 'none' type, even when it
was obvious that it should. Once that was fixed it became apparent
that inequality checks also fell into the same subtraction trap
that the type pass warned about in a comment.
This then led to another funny problem with select_val, consider
the following code:
{bif,'>=',{f,0},[{x,0},{integer,1}],{x,0}}.
{select_val,{x,0},{f,70},{list,[{atom,false},{f,69},
{atom,true},{f,68}]}}.
The validator knows that '>=' can only return a boolean, so once it
has subtracted 'false' and 'true' it killed the state because all
all valid branches had been taken, so validation would crash once
it tried to branch off the fail label.
|
|
The current type conflict resolution works well for the example
case in the comment, but doesn't handle branched code properly,
consider the following:
{label,2}.
{test,is_tagged_tuple,{f,ignored},[{x,0},3,{atom,r}]}.
{allocate_zero,2,1}.
{move,{x,0},{y,0}}.
%% {y,0} is known to be {r, _, _} now.
{get_tuple_element,{x,0},2,{x,0}}.
{'try',{y,1},{f,3}}.
%% ... snip ...
{jump,{f,5}}.
{label,3}.
{try_case,{y,1}}.
%% {x,0} is the error class (an atom), {x,1} is the error term.
{test,is_eq_exact,{f,ignored},[{x,0},{y,0}]}.
%% ... since tuples and atoms can't meet, the type of {y,0} is
%% now {atom,[]} because the current code assumes the type
%% we're updating with.
{move,{x,1},{x,0}}.
{jump,{f,5}}.
{label,5}.
%% ... joining tuple (block 2) and atom (block 3) means 'term',
%% so the get_tuple_element instruction fails to validate
%% despite this being unrechable from block 3.
{test_heap,3,1}.
{get_tuple_element,{y,0},1,{x,1}}.
{put_tuple2,{x,0},{list,[{x,1},{x,0}]}}.
{deallocate,2}.
return.
This commit kills the state on type conflicts, making unreachable
instructions truly unreachable.
|
|
Building terms with fragile contents is okay because the GC is
disabled during loop_rec, and the resulting term won't be reachable
from the root set afterwards.
ERL-862
|
|
This is a rather subtle but important distinction. While tracking
types on a per-register basis is fairly effective, it forces us to
track which registers alias each other, and makes it tricky to infer
types over large blocks of code as instruction arguments may have
been clobbered between definition and inference.
Tracking types on a per-value basis makes us immune to these
problems.
|
|
Consider the following code:
bme(Int) ->
TagInt = Int band 2#111,
Tag = case TagInt of
0 -> a; 1 -> b; 2 -> c; 3 -> d;
4 -> e; 5 -> f; 6 -> g; 7 -> h
end,
case Tag of
g -> expects_g(TagInt, Tag);
h -> expects_h(TagInt, Tag);
_ -> Tag = id(Tag), ok
end.
expects_g(6, Atom) -> Atom = id(g), ok.
expects_h(7, Atom) -> Atom = id(h), ok.
The type optimization pass would recognize that TagInt can only be
[0 .. 7], so the first 'case' would select_val over [0 .. 6] and swap
out the fail label with the block for 7.
A later optimization would merge this block with 'expects_h' in the
second case, as the latter is only reachable from the former.
... but this broke down when the move elimination optimization didn't
take the fail label of the first select_val into account. This caused it
believe that the only way to reach 'expects_h' was through the second
case when 'Tag' =:= 'h', which made it remove the move instruction
added in the first case, passing garbage to expects_h/2.
|
|
`sys_core_fold` has an optimization of repeated pattern matching.
For example, when a record is matched the first time, the pattern
is remembered. When the same record is matched again, the matching
does not need to be repeated, but the variables bound in the first
matching can be re-used.
It turns out that that there is a name capture problem when the old
inliner is used. The old inliner is used when explicitly inling
certain functions, and by the compiler test suites for testing the
compiler.
The name capture problem could be eliminated by more aggressive
variable renaming when inlining.
But, fortunately, given the new SSA passes, this optimization is no
longer as essential as it used to be. Removing the optimization
turns out to be mostly benefical, leading to a smaller stack
frame in many cases.
Also remove the optimizations of `element/2`, `is_record/3`, and
`setelement/3` from `sys_core_fold`. Because matched patterns are no
longer remembered, those optimizations can very rarely be applied any
more. (Those same optimizations are already done in `beam_ssa_type`.)
|