Age | Commit message (Collapse) | Author |
|
* john/compiler/cuddle-validator:
compiler: Allow disabling SSA passes in -compile() directives
beam_validator: Infer types from result of all type test BIFs
beam_validator: Infer BIF argument types
beam_validator: type_test in BIFs that only fail on invalid types
beam_validator: Simplify complex branches
beam_validator: Explain why verify_get_map wipes dst registers
beam_validator: fconv means we have a number
beam_validator: Simplify update_ne/eq_types
beam_validator: Remove pointless fragility propagation
beam_validator: Misc cosmetic refactoring
|
|
This fixes compiling against older OTP versions with the +r?VSN
options, which often expand to disabling certain SSA passes to
avoid new instructions.
|
|
The compiler will usually optimize these into test instructions,
but they'll nevertheless pop up in some cases.
|
|
If we know that a BIF will badarg unless its arguments have certain
types, we can infer that we have at least those types on success.
Note that we can't do this in the general case as the BIF could
fail for reasons other than bad arguments.
|
|
eb0b8da6e816 started to use a binary instead of a string in
bs_match_string instructions.
Remove a clause that attempts to handle the old form of
bs_match_string from old .S files. This is pointless, because an old
.S file is likely to contain a bs_context_to_binary instruction as
well. The bs_context_to_binary is no longer recognized by
`beam_validator`, so those old .S files will not work anyway.
|
|
* bjorn/compiler/beam_ssa_opt:
Make sure that beam_ssa_opt optimizes all functions
|
|
* maint:
Add persistent_term:get(Key, DefaultValue)
Make dialyzer faster for left-associative andalso/orelse expressions
|
|
The function `get_call_order_po/2` did not always include all
functions in a module. The method it used was to first find
all leave functions (functions not calling any other function),
and then find all others functions that called any of the leave
functions either directly or indirectly.
Functions that did not call a leave function (directly or indirectly)
would not be included in the list of functions to optimize.
Reimplement `get_call_order_po/2` to use the standard algorithm
for constructing a list of nodes in reverse postorder and then
reverse that list.
|
|
https://bugs.erlang.org/browse/ERL-851
|
|
|
|
beam_ssa_type: Propagate the 'none' type from calls
|
|
Consider this pseudo code:
f(...) ->
Val = case Expr of
... ->
... ;
... ->
... ;
... ->
my_abort(something_went_wrong)
end,
%% Here follows code that uses Val.
.
.
.
my_abort(Reason) ->
throw({error,Reason}).
The first two clauses in the case will probably provide some
information about the type of the variable `Var`, information
that would be useful for optimizing the code that follows the
case.
However, the third clause would ruin everything. The call
to `my_abort/1` could return anything, and thus `Val` could
also have any type.
294d66a295f6 introduced module-level type analysis, which will in
general keep track of the return type of a local function
call. However, it does not improve the optimization for this specific
function. When a function never returns, that is, when its type is
`none`, it does not propagate the `none` type, but instead pretends
that the return type is `any`.
This commit extends the handling of functions that don't return to
properly handle the `none` type. Any instructions that directly
follows the function that does not return will be discarded, and the
call will be rewritten to a tail-recursive call.
For this specific example, it means that the type for `Val` deduced
from the first two clauses will be retained and can be used for
optimizing the code after the case.
|
|
|
|
Branches where the state is altered on both success and failure
are hard to follow and require juggling the current state, so this
commit adds a helper function to make it easier.
|
|
|
|
|
|
|
|
set_aliased_type already preserves fragility when updating the type
of a fragile register.
|
|
|
|
Optimize ssa_opt_sink for huge functions
|
|
The ssa_opt_sink optimization of beam_ssa_opt could get very slow
for certain huge functions. 9a190cae9bd7 partly addressed this issue
by terminating the optimization early if there happened to be no
get_tuple_element instructions at all in the function.
This commit addresses the issue more directly by making the dominator
calculation in beam_ssa:dominators/1 more efficient. The same
algorithm as before is used, but it is implemented in a more efficient
way based on the ideas in "A Simple, Fast Dominance Algorithm"
(http://www.hipersoft.rice.edu/grads/publications/dom14.pdf).
As well as being more efficient, the new implementation also gives
an explicit representation of the dominator tree, which makes it
possible to simplify and optimize the ssa_opt_sink optimization.
|
|
Prior to 294d66a295f6c2101fe3c2da630979ad4e736c08 there wasn't much
point to keeping track of tuple element types; they were only known
when we had inserted or extracted values from a tuple, and in
neither case was it likely that we'd extract the same values again.
It makes a lot more sense to do so now that type optimizations are
applied across functions; if we return a tuple it's very likely
that its elements will be extracted soon after, and knowing their
types lets us eliminate more type checks.
Co-authored-by: Björn Gustavsson <[email protected]>
|
|
|
|
|
|
|
|
Reduce compilation times
|
|
There could be a warning with a `no_file` atom instead of filename
and line number.
|
|
Two helper functions in beam_ssa_opt and beam_ssa_dead
are body-recursive for no good reason.
While at it, add some clarifying comments to the functions.
|
|
beam_ssa:def_used/2 is used by beam_ssa_pre_codegen when reserving
Y registers.
Do the following optimizations:
* Use an ordset instead of a gb_set. When the only operation performed
on a set is union/2, an ordset will usually be faster, especially when
the result is an ordset.
* Use a cerl_set instead of a gb_set for the set of all possible
predecessors. cerl_sets is usually faster than gb_sets.
|
|
Avoiding calls usually reduces the size of the stack frame and reduces
register shuffling.
|
|
|
|
Save a little time by using gb_sets:delete/2 instead of
gb_sets:delete_any/2 when the key is known to be in the set.
|
|
beam_ssa_dead could be very slow if there were many blocks
connected with unconditional branches (for example, if a block
had contained many `call` instructions and been split by
ssa_opt_split_blocks).
It turns out that `comb_get_sw/3` does an unnecessary (and perhaps
incorrect) recursive call to itself when the terminator for the
block is an unconditional branch. Removing the recursive call does
not disable any optimizations, but will be much faster if there
are many blocks connected with unconditional branches.
Reported-by: Michał Muskała
|
|
Compilation will be much faster if there are many blocks, but no
get_tuple_element instructions.
Reported-by: Michał Muskała
|
|
Fix internal consistency failure caused by beam_except
|
|
Fix a bug where the number of live registers in a `bs_get_tail`
instruction was too low.
Consider this example:
-export([bs_get_tail/2]).
bs_get_tail(Bin, Config) ->
bs_get_tail_1(Bin, 0, 0, Config).
bs_get_tail_1(<<_:32, Rest/binary>>, Z1, Z2, F1) ->
{Rest,Z1,Z2,F1}.
`beam_validator` would emit the following diagnostics:
t: function bs_get_tail_1/4+2:
Internal consistency check failed - please report this bug.
Instruction: {func_info,{atom,t},{atom,bs_get_tail_1},4}
Error: {uninitialized_reg,{x,3}}:
Here is the part of the code that generates the `function_clause`
exception before the optimization:
{test_heap,6,4}.
{put_list,{x,3},nil,{x,2}}.
{put_list,{integer,0},{x,2},{x,2}}.
{put_list,{integer,0},{x,2},{x,2}}.
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},3}. %3 live registers.
{test_heap,2,3}.
{put_list,{x,0},{x,2},{x,1}}.
{move,{atom,function_clause},{x,0}}.
{line,[{location,"t.erl",8}]}.
{call_ext_only,2,{extfunc,erlang,error,2}}.
The `bs_get_tail` instruction expects that 3 registers will be live
at this point. `beam_except` rewrites the code like this:
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},3}. %Still 3. Too low.
{move,{integer,0},{x,1}}.
{move,{integer,0},{x,2}}.
{jump,{f,3}}.
Now the number of live registers in `bs_get_tail` is too low,
because the `{x,3}` register will become undefined.
This commit adds code to update the number of live registers
in the `bs_get_tail` instruction, producing this code:
{bs_set_position,{x,1},{x,0}}.
{bs_get_tail,{x,1},{x,0},4}. %Adjusted to 4.
{move,{integer,0},{x,1}}.
{move,{integer,0},{x,2}}.
{jump,{f,3}}.
|
|
* maint:
Eliminate bogus warning when using tuple calls
|
|
There would be a bogus warning when compiling the following
function with the `tuple_calls` option:
dispatch(X) ->
(list_to_atom("prefix_" ++ atom_to_list(suffix))):doit(X).
The warning would look like this:
no_file: this expression will fail with a 'badarg' exception
https://bugs.erlang.org/browse/ERL-838
|
|
Enhance optimization of function_clause exceptions
|
|
There is an optimization for reducing the number of instructions needed
to generate a `function_clause`. After the latest improvements of the
type optimization pass, that optimization is not always applied.
Here is an example:
-export([foo/3]).
foo(X, Y, Z) ->
bar(a, X, Y, Z).
bar(a, X, Y, Z) when is_tuple(X) ->
{X,Y,Z}.
Note that the compiler internally adds a clause to each function to
generate a `function_clause` exception. Thus:
bar(a, X, Y, Z) when is_tuple(X) ->
{X,Y,Z};
bar(A1, A2, A3, A4) ->
erlang:error(function_clause, [A1,A2,A3,A4]).
Optimizations will rewrite the code basically like this:
bar(_, X, Y, Z) when is_tuple(X) ->
{X,Y,Z};
bar(_, A2, A3, A4) ->
erlang:error(function_clause, [a,A2,A3,A4]).
Note the `a` as the first element of the list of arguments. It
will prevent the optimization of the `function_clause` exception.
The BEAM code for `bar/4` looks like this:
{function, bar, 4, 4}.
{label,3}.
{line,[{location,"t.erl",8}]}.
{func_info,{atom,t},{atom,bar},4}.
{label,4}.
{'%',{type_info,{x,0},{atom,a}}}.
{test,is_tuple,{f,5},[{x,1}]}.
{test_heap,4,4}.
{put_tuple2,{x,0},{list,[{x,1},{x,2},{x,3}]}}.
return.
{label,5}.
{test_heap,8,4}.
{put_list,{x,3},nil,{x,0}}.
{put_list,{x,2},{x,0},{x,0}}.
{put_list,{x,1},{x,0},{x,0}}.
{put_list,{atom,a},{x,0},{x,1}}.
{move,{atom,function_clause},{x,0}}.
{line,[{location,"t.erl",8}]}.
{call_ext,2,{extfunc,erlang,error,2}}.
The code after label 5 is the clause that generates the
`function_clause` exception.
This commit generalizes the optimization so that it can be applied for
this function:
{function, bar, 4, 4}.
{label,3}.
{line,[{location,"t.erl",8}]}.
{func_info,{atom,t},{atom,bar},4}.
{label,4}.
{'%',{type_info,{x,0},{atom,a}}}.
{test,is_tuple,{f,5},[{x,1}]}.
{test_heap,4,4}.
{put_tuple2,{x,0},{list,[{x,1},{x,2},{x,3}]}}.
return.
{label,5}.
{move,{atom,a},{x,0}}.
{jump,{f,3}}.
For this particular function, it would be safe to omit the
`move` instruction before the `{jump,{f,3}}` instruction, but
it would not be safe in general to omit `move` instructions.
|
|
* john/compiler/refactor-validator-type-mgmt:
beam_validator: Add explicit assertions for fragile terms
beam_validator: Refactor type management
|
|
Speed up the compiler when compiling the idna package
|
|
Fix problems compiling Scalaris
|
|
We haven't seen any related bugs so far, but all instructions that
place a term in another ought to reject fragile inputs. It can't
hurt to check.
|
|
Our current type management (based on set_type_reg etc) is rather
error-prone, often requiring special cases on a per-instruction
basis. This commit replaces nearly all ad-hoc mechanisms with more
general abstractions:
* assign - Moves a term.
* create_term - Creates a new term.
* extract_term - Extracts a term from another, maintaining
fragility as required.
* update_type - Adds more type information about a register.
* type_test - Helper function for type tests that subtracts on
failure and meets on success.
|
|
The translation from `error(function_clause, Args)` to a jump
to the `func_info` label is not safe if there is a stack frame.
|
|
Compilation of code similar the following would be very slow:
uts46_map(CP) when 0 =< CP, CP =< 44 -> '3';
uts46_map(CP) when 45 =< CP, CP =< 46 -> 'V';
uts46_map(CP) when 48 =< CP, CP =< 57 -> 'V';
%% More than 2500 similar lines follows.
.
.
.
The code is from from:
https://github.com/benoitc/erlang-idna/blob/3eb54ccbfa6fb917c0f4ca9197da337ad888ffe0/src/idna_mapping.erl#L6780
By using information about skippable blocks, the beam_ssa_dead
pass can be sped up to compile idna_mapping.erl about 10 times faster.
|
|
It is never possible to merge a block ending in a switch with the
next block, so it is not necessary to call `beam_ssa:successors/1` in
that case. Avoiding the call slightly improves compilation speeds
for switches with many branches.
|
|
To improve compilation times, beam_ssa_type keeps track of variables
that are only used once and don't keep types for those variables. As
currently implemented, it turns to be unsafe. Change it to only keep
track of variables that are only used in the terminator of the block
they are defined in.
https://bugs.erlang.org/browse/ERL-840
|
|
If compilation failed, the name of the current function *and*
all previously compiled functions would be printed because
phase/4 was not tail-recursive.
https://bugs.erlang.org/browse/ERL-840
|