Age | Commit message (Collapse) | Author |
|
`sys_core_fold` has an optimization of repeated pattern matching.
For example, when a record is matched the first time, the pattern
is remembered. When the same record is matched again, the matching
does not need to be repeated, but the variables bound in the first
matching can be re-used.
It turns out that that there is a name capture problem when the old
inliner is used. The old inliner is used when explicitly inling
certain functions, and by the compiler test suites for testing the
compiler.
The name capture problem could be eliminated by more aggressive
variable renaming when inlining.
But, fortunately, given the new SSA passes, this optimization is no
longer as essential as it used to be. Removing the optimization
turns out to be mostly benefical, leading to a smaller stack
frame in many cases.
Also remove the optimizations of `element/2`, `is_record/3`, and
`setelement/3` from `sys_core_fold`. Because matched patterns are no
longer remembered, those optimizations can very rarely be applied any
more. (Those same optimizations are already done in `beam_ssa_type`.)
|
|
|
|
|
|
The expansion of record field updates, when more than one field is
updated, but not a majority of the fields, will create a sequence of
calls to `erlang:setelement(Index, Value, Tuple)` where Tuple in the
first call is the original record tuple, and in the subsequent calls
Tuple is the result of the previous call. Furthermore, all Index
values are constant positive integers, and the first call to
`setelement` will have the greatest index. Thus all the following
calls do not actually need to test at run-time whether Tuple has type
tuple, nor that the index is within the tuple bounds.
Since OTP R7, the `sys_core_dsetel` pass, run as the very last Core
Erlang pass, has optimized this sequence of `setelement` calls to use
a special destructive version of `setelement` (called
`set_tuple_element`) for all but the very first `setelement` in the
sequence.
It turns out that the presence of the `set_tuple_element` in SSA code
is awkward and can prevent or complicate type analysis and aggressive
optimizations.
Therefore, this commit removes the `sys_core_dsetel` pass and
reimplements it for SSA code. The optimization will be done in the
`beam_ssa_pre_codegen` pass (that is, just before code generation and
after running all other SSA code optimization passes).
In most cases, the resulting BEAM code is identical to previous
code. For a few modules, the BEAM code is actually slightly better,
with smaller stack frames.
|
|
The sole purpose of inline_SUITE:coverage/1 is to ensure
that all lines are covered in sys_core_inline. Do that
in a cheaper way.
|
|
This test case does not test anything unique that is not
tested by other test cases.
|
|
This makes sure that the SSA optimizations are not essential and
may help to cover more code in beam_ssa_pre_codegen and
beam_ssa_codegen.
|
|
|
|
|
|
|
|
With the recent update to cover to use the `counters` modules,
there is no longer any reason to limit the number of parallel
processes when running `cover`.
|
|
This change saves about about a minute when running the
cover-compiled compiler tests.
|
|
Prior to 294d66a295f6c2101fe3c2da630979ad4e736c08 there wasn't much
point to keeping track of tuple element types; they were only known
when we had inserted or extracted values from a tuple, and in
neither case was it likely that we'd extract the same values again.
It makes a lot more sense to do so now that type optimizations are
applied across functions; if we return a tuple it's very likely
that its elements will be extracted soon after, and knowing their
types lets us eliminate more type checks.
Co-authored-by: Björn Gustavsson <[email protected]>
|
|
A long time ago there was a good idea to run compiled code in a
slave node. Nowadays, not so much.
|
|
test_lib:is_cloned_mod(inline_SUITE) would return true.
|
|
|
|
With the new SSA code passes, the optimized_guard/1 test case has
become really bad at finding unnecessary `and` and `or` instructions.
|
|
Fix internal consistency failure caused by beam_except
|
|
Fix a bug where the number of live registers in a `bs_get_tail`
instruction was too low.
Consider this example:
-export([bs_get_tail/2]).
bs_get_tail(Bin, Config) ->
bs_get_tail_1(Bin, 0, 0, Config).
bs_get_tail_1(<<_:32, Rest/binary>>, Z1, Z2, F1) ->
{Rest,Z1,Z2,F1}.
`beam_validator` would emit the following diagnostics:
t: function bs_get_tail_1/4+2:
Internal consistency check failed - please report this bug.
Instruction: {func_info,{atom,t},{atom,bs_get_tail_1},4}
Error: {uninitialized_reg,{x,3}}:
Here is the part of the code that generates the `function_clause`
exception before the optimization:
{test_heap,6,4}.
{put_list,{x,3},nil,{x,2}}.
{put_list,{integer,0},{x,2},{x,2}}.
{put_list,{integer,0},{x,2},{x,2}}.
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},3}. %3 live registers.
{test_heap,2,3}.
{put_list,{x,0},{x,2},{x,1}}.
{move,{atom,function_clause},{x,0}}.
{line,[{location,"t.erl",8}]}.
{call_ext_only,2,{extfunc,erlang,error,2}}.
The `bs_get_tail` instruction expects that 3 registers will be live
at this point. `beam_except` rewrites the code like this:
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},3}. %Still 3. Too low.
{move,{integer,0},{x,1}}.
{move,{integer,0},{x,2}}.
{jump,{f,3}}.
Now the number of live registers in `bs_get_tail` is too low,
because the `{x,3}` register will become undefined.
This commit adds code to update the number of live registers
in the `bs_get_tail` instruction, producing this code:
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},4}. %Adjusted to 4.
{move,{integer,0},{x,1}}.
{move,{integer,0},{x,2}}.
{jump,{f,3}}.
|
|
* maint:
Eliminate bogus warning when using tuple calls
|
|
There would be a bogus warning when compiling the following
function with the `tuple_calls` option:
dispatch(X) ->
(list_to_atom("prefix_" ++ atom_to_list(suffix))):doit(X).
The warning would look like this:
no_file: this expression will fail with a 'badarg' exception
https://bugs.erlang.org/browse/ERL-838
|
|
There is an optimization for reducing the number of instructions needed
to generate a `function_clause`. After the latest improvements of the
type optimization pass, that optimization is not always applied.
Here is an example:
-export([foo/3]).
foo(X, Y, Z) ->
bar(a, X, Y, Z).
bar(a, X, Y, Z) when is_tuple(X) ->
{X,Y,Z}.
Note that the compiler internally adds a clause to each function to
generate a `function_clause` exception. Thus:
bar(a, X, Y, Z) when is_tuple(X) ->
{X,Y,Z};
bar(A1, A2, A3, A4) ->
erlang:error(function_clause, [A1,A2,A3,A4]).
Optimizations will rewrite the code basically like this:
bar(_, X, Y, Z) when is_tuple(X) ->
{X,Y,Z};
bar(_, A2, A3, A4) ->
erlang:error(function_clause, [a,A2,A3,A4]).
Note the `a` as the first element of the list of arguments. It
will prevent the optimization of the `function_clause` exception.
The BEAM code for `bar/4` looks like this:
{function, bar, 4, 4}.
{label,3}.
{line,[{location,"t.erl",8}]}.
{func_info,{atom,t},{atom,bar},4}.
{label,4}.
{'%',{type_info,{x,0},{atom,a}}}.
{test,is_tuple,{f,5},[{x,1}]}.
{test_heap,4,4}.
{put_tuple2,{x,0},{list,[{x,1},{x,2},{x,3}]}}.
return.
{label,5}.
{test_heap,8,4}.
{put_list,{x,3},nil,{x,0}}.
{put_list,{x,2},{x,0},{x,0}}.
{put_list,{x,1},{x,0},{x,0}}.
{put_list,{atom,a},{x,0},{x,1}}.
{move,{atom,function_clause},{x,0}}.
{line,[{location,"t.erl",8}]}.
{call_ext,2,{extfunc,erlang,error,2}}.
The code after label 5 is the clause that generates the
`function_clause` exception.
This commit generalizes the optimization so that it can be applied for
this function:
{function, bar, 4, 4}.
{label,3}.
{line,[{location,"t.erl",8}]}.
{func_info,{atom,t},{atom,bar},4}.
{label,4}.
{'%',{type_info,{x,0},{atom,a}}}.
{test,is_tuple,{f,5},[{x,1}]}.
{test_heap,4,4}.
{put_tuple2,{x,0},{list,[{x,1},{x,2},{x,3}]}}.
return.
{label,5}.
{move,{atom,a},{x,0}}.
{jump,{f,3}}.
For this particular function, it would be safe to omit the
`move` instruction before the `{jump,{f,3}}` instruction, but
it would not be safe in general to omit `move` instructions.
|
|
To improve compilation times, beam_ssa_type keeps track of variables
that are only used once and don't keep types for those variables. As
currently implemented, it turns to be unsafe. Change it to only keep
track of variables that are only used in the terminator of the block
they are defined in.
https://bugs.erlang.org/browse/ERL-840
|
|
The fix in f9ea85611faca82c7494449ddb8bcb1ef1d194cb didn't consider
that the tested register could be aliased.
|
|
This commit lets the type optimization pass work across functions,
tracking return and argument types to eliminate redundant tests.
|
|
This serves as a base for the upcoming module-level type
optimization, but may come in handy for other passes like
beam_ssa_funs and beam_ssa_bsm that have their own ad-hoc
implementations.
|
|
A switch is equivalent to a series of '=:=', so we have to subtract
each value individually from the type. Subtracting a join risks
removing too much type information, and managed to narrow "number"
into "float" in the attached test case.
|
|
|
|
* bjorn/compiler/beam_validator/ERL-832:
Make the beam_validator smarter (again)
|
|
Needed because of the optimizations in 48f20bd589fa69.
https://bugs.erlang.org/browse/ERL-832
|
|
Do the optimizations of bs_put* instructions in beam_ssa_opt
and remove the beam_bs pass. This can lead to a slight improvement
of compilation times.
|
|
* maint:
beam_type: Eliminate compiler crash when arithmetic expression fails
Conflicts:
lib/compiler/src/beam_type.erl
|
|
The compiler would crash when compiling code such as:
(A / B) band 16#ff
The type for the expression would be 'none', but beam_type:verified_type/1
did not handle 'none'.
https://bugs.erlang.org/browse/ERL-829
|
|
Introduce subtraction of types to allow some redundant tests to be
eliminated.
Consider this function:
foo(L) when is_list(L) ->
case L of
[_|_] -> non_empty;
[] -> empty
end.
After entering the body of the function, L is known to be either
a cons cell or nil (otherwise the is_list/1 guard would have failed).
If the L is not a cons cell, it must be nil. Therefore, the test
for nil in the second clause of the case can be eliminated.
Here is the SSA code with some additonal comments for the function
before the optimization:
function t:foo(_0) {
0:
@ssa_bool = bif:is_list _0
br @ssa_bool, label 4, label 3
4:
%% _0 is now a list (cons or nil).
@ssa_bool:8 = is_nonempty_list _0
br @ssa_bool:8, label 9, label 7
9:
ret literal non_empty
7:
%% _0 is not a cons (or we wouldn't be here).
%% Subtracting cons from the previously known type list
%% gives that _0 must be nil.
@ssa_bool:10 = bif:'=:=' _0, literal []
br @ssa_bool:10, label 11, label 6
11:
ret literal empty
6:
_6 = put_tuple literal case_clause, _0
%% t.erl:5
@ssa_ret:12 = call remote (literal erlang):(literal error)/1, _6
ret @ssa_ret:12
3:
_9 = put_list _0, literal []
%% t.erl:4
@ssa_ret:13 = call remote (literal erlang):(literal error)/2, literal function_clause, _9
ret @ssa_ret:13
}
Type subtraction gives us that _0 must be nil in block 7, allowing us to
remove the comparison of _0 with nil. The code for the function can be simplified
to:
function t:foo(_0) {
0:
@ssa_bool = bif:is_list _0
br @ssa_bool, label 4, label 3
4:
@ssa_bool:8 = is_nonempty_list _0
br @ssa_bool:8, label 9, label 11
9:
ret literal non_empty
11:
ret literal empty
3:
_9 = put_list _0, literal []
%% t.erl:4
@ssa_ret:13 = call remote (literal erlang):(literal error)/2, literal function_clause, _9
ret @ssa_ret:13
}
|
|
* maint:
Remove unsafe optimization for delaying creation of stackframe
Conflicts:
lib/compiler/src/v3_codegen.erl
|
|
b89044a800c4 introduced an optimization that tries to delay creation
of stack frames. It turns out that this optimization is not always
safe. (See the new test case for an example.)
Since the code generator is completely rewritten in the `master`
branch for the upcoming OTP 22 release, it does not make sense trying
to mend this optimization. It is better to remove it. Out of a sample
of about 1000 modules in OTP, about 50 of them are changed as a result
of removing this optimization.
The compiler in OTP 22 will do the same optimization in a cleaner,
safer, and more effective way.
https://bugs.erlang.org/browse/ERL-807
|
|
* bjorn/cuddle-with-tests:
Cover code in beam_ssa_opt
Cover code in beam_ssa_dead
Cover code in beam_trim
Eliminate warnings for unused variables
regressions_SUITE: Fix exports
map_SUITE: Test for mixed map creation
map_SUITE: Fix indentation
|
|
* maint:
Fix unsafe optimization of stack trace building
|
|
The `sys_core_fold` pass of the compiler would optimize
away the building of the stacktrace in code such as:
try
...
catch
C:R:Stk ->
erlang:raise(C, {R,Stk}, Stk)
end
That optimization is unsafe and would cause a crash in a later compiler
pass.
|
|
The following function:
is_two_tuple(Arg) ->
case is_tuple(Arg) of
false -> false;
true -> tuple_size(Arg) == 2
end.
would cause an internal consistency failure:
Internal consistency check failed - please report this bug.
Instruction: {bif,tuple_size,{f,0},[{x,0}],{z,0}}
Error: {invalid_store,{z,0},{integer,[]}}:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The previous optimizations caused some code in beam_jump to
become uncovered. Add tests to cover more code. Also remove
a clause in beam_jump:opt/3 that does not seem possible to
cover anymore (this is safe, because the clause was an
optimization).
|
|
Some lines in beam_peep were no longer covered when the sharing optimization
was added to beam_ssa_opt. Also remove some code from beam_peep that no
longer seems possible to cover.
|
|
Share code for semantically equivalent blocks referred to to by `br`
and `switch` instructions.
A similar optimization is done in `beam_jump`, but doing it here as
well is beneficial as it may enable other optimizations. Also, if
there are many semantically equivalent clauses, this optimization can
substanstially decrease compilation times.
|