Age | Commit message (Collapse) | Author |
|
Validation could fail when a function that never returned was used
in a try block (see attached test case). It's possible to solve
this without disabling the optimization as the generated code is
sound, but I'm not comfortable making such a large change this
close to the OTP 22 release.
|
|
* bjorn/hipe-compilation/OTP-15596:
HiPE: Don't fail the compilation for unimplemented instructions
|
|
|
|
|
|
* john/compiler/fix-eq-type-infererence-in-validator/ERL-886:
beam_validator: Infer types on both sides of '=:='
|
|
The script would hang if no local hex had been installed previously.
|
|
|
|
Optimize tail-recursive calls of BIFs
OTP-15674
|
|
* maint:
Updated OTP version
Prepare release
|
|
|
|
BEAM currently does not call BIFs at the end of a function in a
tail-recursive way. That is, when calling a BIF at the end of a
function, the BIF is first called, and then the stack frame
is deallocated, and then control is transferred to the caller.
If there is no stack frame when a BIF is called in the tail position,
the loader will emit a sequence of three instructions: first an
instruction that allocates a stack frame and saves the continuation
pointer (`allocate`), then an instruction that calls the BIF
(`call_bif`), and lastly an instruction that deallocates the stack
frame and returns to the caller (`deallocate_return`).
The old compiler would essentially allocate a stack frame for each
clause in a function, so it would not be that common that a BIF was
called in the tail position when there was no stack frame, so the
three-instruction sequence was deemed acceptable.
The new compiler only allocates stack frames when truly needed, so
the three-instruction BIF call sequence has become much more common.
This commit introduces a new `call_bif_only` instruction so that only
one instruction will be needed when calling a BIF in the tail position
when there is no stack frame. This instruction is also used when there
is a stack frame to make it possible to deallocate the stack frame
**before** calling the BIF, which may make a subsequent garbage
collection at the end of the BIF call cheaper (copying less garbage).
The one downside of this change is that the function that called the
BIF will not be included in the stack backtrace (similar to how a
tail-recursive call to an Erlang function will not be included in the
backtrace).
That was the quick summary of the commit. Here comes a detailed look
at how BIF calls are translated by the loader. The first example is a
function that calls `setelement/3` in the tail position:
update_no_stackframe(X) ->
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_no_stackframe, 1, 12}.
{label,11}.
{line,[...]}.
{func_info,{atom,t},{atom,update_no_stackframe},1}.
{label,12}.
{move,{x,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
Because there is no stack frame, the `call_ext_only` instruction will
be used to call `setelement/3`:
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
The loader will transform this instruction to a three-instruction
sequence:
0000000020BD8130: allocate_tt 0 3
0000000020BD8138: call_bif_e erlang:setelement/3
0000000020BD8148: deallocate_return_Q 0
Using the `call_bif_only` instruction introduced in this commit,
only one instruction is needed:
000000005DC377F0: call_bif_only_e erlang:setelement/3
`call_bif_only` calls the BIF and returns to the caller.
Now let's look at a function that already has a stack frame when
`setelement/3` is called:
update_with_stackframe(X) ->
foobar(X),
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_with_stackframe, 1, 14}.
{label,13}.
{line,[...]}.
{func_info,{atom,t},{atom,update_with_stackframe},1}.
{label,14}.
{allocate,1,1}.
{move,{x,0},{y,0}}.
{line,[...]}.
{call,1,{f,16}}.
{move,{y,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Since there is a stack frame, the `call_ext_last` instruction will be used
to deallocate the stack frame and call the function:
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Before this commit, the loader would translate this instruction to:
0000000020BD81B8: call_bif_e erlang:setelement/3
0000000020BD81C8: deallocate_return_Q 1
That is, the BIF is called before deallocating the stack frame and returning
to the calling function.
After this commit, the loader will translate the `call_ext_last` like this:
000000005DC37868: deallocate_Q 1
000000005DC37870: call_bif_only_e erlang:setelement/3
There are still two instructions, but now the stack frame will be
deallocated before calling the BIF, which could make the potential
garbage collection after the BIF call slightly more efficient (copying
less garbage).
We could have introduced a `call_bif_last` instruction, but the code
for calling a BIF is relatively large and there does not seem be a
practical way to share the code between `call_bif` and `call_bif_only`
(since the difference is at the end, after the BIF call). Therefore,
we did not want to clone the BIF calling code yet another time to
make a `call_bif_last` instruction.
|
|
For reasons better explained in the source code, ssa_opt_float
skips optimizing inside guards but it failed to do so
consistently; while the pass never processed guard blocks, it was
still possible to erroneously defer error checking to a guard
block, crashing the compiler once it realized its state was
invalid.
|
|
This ensures that unreachable branches are properly ignored on
repeated checks (although tuple type subtraction isn't complete
yet).
|
|
Type subtraction never resulted in the 'none' type, even when it
was obvious that it should. Once that was fixed it became apparent
that inequality checks also fell into the same subtraction trap
that the type pass warned about in a comment.
This then led to another funny problem with select_val, consider
the following code:
{bif,'>=',{f,0},[{x,0},{integer,1}],{x,0}}.
{select_val,{x,0},{f,70},{list,[{atom,false},{f,69},
{atom,true},{f,68}]}}.
The validator knows that '>=' can only return a boolean, so once it
has subtracted 'false' and 'true' it killed the state because all
all valid branches had been taken, so validation would crash once
it tried to branch off the fail label.
|
|
|
|
|
|
|
|
The current type conflict resolution works well for the example
case in the comment, but doesn't handle branched code properly,
consider the following:
{label,2}.
{test,is_tagged_tuple,{f,ignored},[{x,0},3,{atom,r}]}.
{allocate_zero,2,1}.
{move,{x,0},{y,0}}.
%% {y,0} is known to be {r, _, _} now.
{get_tuple_element,{x,0},2,{x,0}}.
{'try',{y,1},{f,3}}.
%% ... snip ...
{jump,{f,5}}.
{label,3}.
{try_case,{y,1}}.
%% {x,0} is the error class (an atom), {x,1} is the error term.
{test,is_eq_exact,{f,ignored},[{x,0},{y,0}]}.
%% ... since tuples and atoms can't meet, the type of {y,0} is
%% now {atom,[]} because the current code assumes the type
%% we're updating with.
{move,{x,1},{x,0}}.
{jump,{f,5}}.
{label,5}.
%% ... joining tuple (block 2) and atom (block 3) means 'term',
%% so the get_tuple_element instruction fails to validate
%% despite this being unrechable from block 3.
{test_heap,3,1}.
{get_tuple_element,{y,0},1,{x,1}}.
{put_tuple2,{x,0},{list,[{x,1},{x,0}]}}.
{deallocate,2}.
return.
This commit kills the state on type conflicts, making unreachable
instructions truly unreachable.
|
|
While complex_test made certain branching instructions a lot easier
to read, we're still using `branch_state` for many others which is
hard to read and makes it impossible to "abort" branches on type
conflicts.
This commit replaces nearly all uses of `branch_state` with a
general branching mechanism, improving readability and paving the
way for proper type conflict resolution.
|
|
The element type can not be extracted before the tuple type has
been updated.
|
|
|
|
Move size=all binary clause pruning to v3_kernel
|
|
Tune BEAM instructions for the new compiler (part 1)
|
|
Optimize the beam_ssa_dead sub pass
|
|
The advantage of moving it up is that it reduces the
size of the code emitted by v3_kernel, speeding
v3_kernel itself and beam_kernel_to_ssa pass.
|
|
Prior to this patch, v3_kernel would do multiple
passes on the clauses to group them. This commit
unrolls those passes, making v3_kernel up to 10%
faster in those cases.
|
|
|
|
This is cleaner and slightly faster.
|
|
The general complexity of the shortcut sub pass of `beam_ssa_dead` is
quadratic, but those optimizations will reduce the constant factor
somewhat.
|
|
Refactor the code to avoid putting any variable from a skippable block
into the set of unset variables. Keeping the set of unset variables as
small as possible will make beam_ssa_dead almost twice as fast when
compiling lib/unicode/tokenizer.ex in elixir.
|
|
Consider this code:
foo(X) ->
case X of
{ok,A} -> A;
error -> X
end.
The `is_tagged_tuple` instruction would not be used
because not all instructions in the tuple matching
sequence had the same failure label:
function t:foo(_0) {
0:
@ssa_bool:7 = bif:is_tuple _0
br @ssa_bool:7, label 8, label 4
8:
@ssa_arity = bif:tuple_size _0
@ssa_bool:9 = bif:'=:=' @ssa_arity, literal 2
br @ssa_bool:9, label 6, label 3
6:
_4 = get_tuple_element _0, literal 0
@ssa_bool = bif:'=:=' _4, literal ok
br @ssa_bool, label 5, label 3
5:
_3 = get_tuple_element _0, literal 1
ret _3
4:
@ssa_bool:11 = bif:'=:=' _0, literal error
br @ssa_bool:11, label 10, label 3
10:
ret _0
3:
_2 = put_tuple literal case_clause, _0
%% t.erl:5
@ssa_ret:12 = call remote (literal erlang):(literal error)/1, _2
ret @ssa_ret:12
}
Enhance the ssa_opt_record optimization to use
`is_tagged_tuple` even if all failure labels are not the
same:
function t:foo(_0) {
0:
@ssa_bool:7 = bif:is_tuple _0
br @ssa_bool:7, label 8, label 4
8:
@ssa_bool:9 = is_tagged_tuple _0, literal 2, literal ok
br @ssa_bool:9, label 6, label 3
6:
_3 = get_tuple_element _0, literal 1
ret _3
4:
@ssa_bool:11 = bif:'=:=' _0, literal error
br @ssa_bool:11, label 10, label 3
10:
ret _0
3:
_2 = put_tuple literal case_clause, _0
%% t.erl:5
@ssa_ret:12 = call remote (literal erlang):(literal error)/1, _2
ret @ssa_ret:12
}
The tuple test will be repeated, but since four instructions
are replaced by two instructions, the code will still be faster
and smaller.
|
|
|
|
|
|
We used to cheat by checking if it were possible to meet the Given
and Required types, which caught the most common problems but
potentially let tuple element conflicts pass through.
This was a compromise to let the thing "work" while we were
refactoring the validator, but we can be a lot stricter now that
its type tracking capabilities approach those of the type
optimization pass.
|
|
Building terms with fragile contents is okay because the GC is
disabled during loop_rec, and the resulting term won't be reachable
from the root set afterwards.
ERL-862
|
|
|
|
This is possible now that we track types on a per-value basis, and
no longer need to care whether the source tuple's register has been
clobbered by the time we infer the type.
|
|
|
|
This is a rather subtle but important distinction. While tracking
types on a per-register basis is fairly effective, it forces us to
track which registers alias each other, and makes it tricky to infer
types over large blocks of code as instruction arguments may have
been clobbered between definition and inference.
Tracking types on a per-value basis makes us immune to these
problems.
|
|
|
|
I have no idea how this escaped us for so long...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|