Age | Commit message (Collapse) | Author |
|
sys_core_fold could do unsafe transformations on the
code from the old inliner (invoked using the compiler
option `{inline,[{F/A}]}` to request inlining of specific
functions).
To explain the bug, let's first look at an example that
sys_core_fold handles correctly. Consider this code:
'foo'/2 =
fun (Arg1,Arg2) ->
let <B> = Arg2
in let <A,B> = <B,Arg1>
in {A,B}
In this example, the lets can be completely eliminated,
since the arguments for the lets are variables (as opposed
to expressions). Since the variable B is rebound in the
inner let, `sys_core_fold` must take special care when
doing the substitutions.
Here is the correct result:
'foo'/2 =
fun (Arg1, Arg2) ->
{Arg2,Arg1}
Consider a slight modifictation of the example:
'bar'/2 =
fun (Arg1,Arg2) ->
let <B> = [Arg2]
in let <A,B> = <B,[Arg1]>
in {A,B}
Here some of the arguments for the lets are expressions, so
the lets must be kept. sys_core_fold does not handle this
example correctly:
'bar'/2 =
fun (Arg1,Arg2) ->
let <B> = [Arg2]
in let <B> = [Arg1]
in {B,B}
In the inner let, the variable A has been eliminated and
replaced with the variable B in the body (the first B in
the tuple). Since the B in the outer let is never used,
the outer let will be eliminated, giving:
'bar'/2 =
fun (Arg1,Arg2) ->
let <B> = [Arg1]
in {B,B}
To handle this example correctly, sys_core_fold must
rename the variable B in the inner let like this to
avoid capturing B:
'bar'/2 =
fun (Arg1,Arg2) ->
let <B> = [Arg2]
in let <NewName> = [Arg1]
in {B,NewName}
(Note: The `v3_kernel` pass alreday handles those examples correctly
in case `sys_core_fold` has been disabled.)
|
|
|
|
Introduce is_map_key/2 guard BIF
OTP-15037
|
|
This complements the `map_get/2` guard BIF introduced in #1784.
Rationale.
`map_get/2` allows accessing map fields in guards, but it might be
problematic in more complex guard expressions, for example:
foo(X) when map_get(a, X) =:= 1 or is_list(X) -> ...
The `is_list/1` part of the guard could never succeed since the
`map_get/2` guard would fail the whole guard expression. In this
situation, this could be solved by using `;` instead of `or` to separate
the guards, but it is not possible in every case.
To solve this situation, this PR proposes a `is_map_key/2` guard that
allows to check if a map has key inside a guard before trying to access
that key. When combined with `is_map/1` this allows to construct a
purely boolean guard expression testing a value of a key in a map.
Implementation.
Given the use case motivating the introduction of this function, the PR
contains compiler optimisations that produce optimial code for the
following guard expression:
foo(X) when is_map(X) and is_map_key(a, X) and map_get(a, X) =:= 1 -> ok;
foo(_) -> error.
Given all three tests share the failure label, the `is_map_key/2` and
`is_map/2` tests are optimised away.
As with `map_get/2` the `is_map_key/2` BIF is allowed in match specs.
|
|
When an exception is handled, the stack will be scanned. Therefore
all Y registers must be initialized.
|
|
Rewrite a call of a literal external fun to a direct call
OTP-15044
|
|
Rewrite calls such as:
(fun erlang:abs/1)(-42)
to:
erlang:abs(-42)
While we are at it, also add rewriting of apply/2 with a fixed
number of arguments to a direct call of the fun. For example:
apply(F, [A,B])
would be rewritten to:
F(A, B)
https://bugs.erlang.org/browse/ERL-614
|
|
sys_core_fold would crash when attempting to optimize this code:
t() when (#{})#{}->
c.
|
|
* 'map-get-bif' of git://github.com/michalmuskala/otp:
Introduce map_get guard-safe function
OTP-15037
|
|
Rationale
Today all compound data types except for maps can be deconstructed in guards.
For tuples we have `element/2` and for lists `hd/1` and `tl/1`. Maps are
completely opaque to guards. This means matching on maps can't be
abstracted into macros, which is often done with repetitive guards. It
also means that maps have to be always selected whole from ETS tables,
even when only one field would be enough, which creates a potential
efficiency issue.
This PR introduces an `erlang:map_get/2` guard-safe function that allows
extracting a map field in guard. An alternative to this function would be
to introduce the syntax for extracting a value from a map that was planned
in the original EEP: `Map#{Key}`.
Even outside of guards, since this function is a guard-BIF it is more
efficient than using `maps:get/2` (since it does not need to set up the
stack), and more convenient from pattern matching on the map (compare:
`#{key := Value} = Map, Value` to `map_get(key, Map)`).
Performance considerations
A common concern against adding this function is the notion that "guards
have to be fast" and ideally execute in constant time. While there are
some counterexamples (`length/1`), what is more important is the fact
that adding those functions does not change in any way the time
complexity of pattern matching - it's already possible to match on map
fields today directly in patterns - adding this ability to guards will
niether slow down or speed up the execution, it will only make certain
programs more convenient to write.
This first version is very naive and does not perform any optimizations.
|
|
Keys in map patterns are input variables, not pattern variables.
|
|
Waiting messages for a process may be stored in a queue
outside of any heap or heap fragment belonging to the process.
This is an optimization added in a recent major release to
avoid garbage collection messages again and again if there
is a long message queue.
Until such message has been matched and accepted by
the remove_message/0 instruction, the message must not be
included in the root set for a garbage collection, as that
would corrupt the message. The loop_rec/2 instruction explicitly
turns off garbage collection of the process as long messages
are being matched.
However, if the compiler were to put references to a message
outside of the heap in an Y register (on the stack) and there
happened to be a GC when the process had been scheduled out,
the message would be corrupted and the runtime system would
crash sooner or later.
To ensure that doesn't happen, teach beam_validator to check
for references on the stack to messages outside of the heap.
|
|
beam_record would make an unsafe optimization for the
not_used_p/4 function added to beam_utils_SUITE in this
commit. The bug is in beam_utils, which would falsely
report that {x,4} was unused when it in fact was used.
The bug was in the function not_used/1. The purpose of
not_used/1 is to return a 'not_used' result unless the
actual result is 'used'. Unfortunately it was not
implemented in that way. It would let a 'transparent'
result slip through, which the caller in this case would
convert to 'killed' (because the register was killed on
all other paths).
Reported-by: Richard Carlsson
|
|
When a generator in a list comprehension was given some
other term than a list, the wrong line could be pointed
out in the exception. Here is an example:
bad_generator() ->
[I || %%This line would be pointed out.
I <- not_a_list].
https://bugs.erlang.org/browse/ERL-572
|
|
A literal negative size in binary construction would cause a crash.
|
|
Refactor and fix minor bugs in beam_type
|
|
|
|
Every catch or try/catch must use a lower Y register number than any
enclosing catch or try/catch. That will ensure that when the stack
is scanned when an exception occurs, the innermost try/catch tag is
found first.
|
|
Eliminate get_list/3 internally in the compiler
|
|
Fix incorrect handling of floating point instructions
|
|
* maint:
Fix incorrect type interference of integer ranges
Conflicts:
lib/compiler/src/beam_type.erl
|
|
1a029efd1ad47f started to run the beam_block pass a second time.
Since it is run after introduction of the optimized floating point
instructions, it must handle those instructions correctly.
In particular, it must be careful when hoisting allocation
instructions. For example, the following code:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{fmove,{fr,2},{x,0}}.
{allocate_zero,1,4}.
must not be rewritten to:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{allocate_zero,1,4}.
{fmove,{fr,2},{x,0}}.
because beam_validator will not consider it safe. (The code may
actually be safe depending on what the code between the two allocation
instructions do.)
https://bugs.erlang.org/browse/ERL-555
|
|
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
misc_SUITE:integer_encoding/1 was written to make sure
that big integers were encoding correctly in a reasonable
amount of time. Now that beam_asm will encode big integers
as literals, we can reduce the scope of integer_encode/1.
That will make it significantly faster, especially when
cover is running.
|
|
Do local common sub expression elimination (CSE)
|
|
Optimize away unnecessary test_unit instructions that verify that
binaries are byte-aligned. In a tight loop, eliminating an
instruction can have a small but measurable improvement of the
execution time.
|
|
Extend an existing optimization in beam_dead to avoid
creating a match context when matching an empty binary.
|
|
Eliminate repeated evaluation of guard BIFs and building of cons cells
in blocks. This optimization is applicable in more places than might be
expected, because code generation for binaries and record can generate
common sub expressions not visible in the original source code.
For example, consider this function:
make_binary(Term) ->
Bin = term_to_binary(Term),
Size = byte_size(Bin),
<<Size:32,Bin/binary>>.
The compiler inserts a call to byte_size/2 to calculate the size of
the binary being built:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}. %Present in original code.
{line,...}.
{gc_bif,byte_size,{f,0},2,[{x,0}],{x,2}}. %Inserted by compiler.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Common sub expression elimination (CSE) eliminates the second call to
byte_size/2:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}.
{move,{x,1},{x,2}}.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Note: A possible future optimization would be to include binary
construction instructions in blocks. If that is done, the
{move,{x,1},{x,2}} instruction could also be eliminated.
|
|
Make sure that there is the correct number of put/1 instructions
following put_tuple/2. Also make it illegal to reference the
register for the tuple being built in a put/1 instruction.
That is, beam_validator will now issue a diagnostice for the the
following code:
{put_tuple,1,{x,0}}.
{put,{x,0}}.
|
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|
|
Argument order can prevent the delayed sub binary creation.
Here is an example directly from the Efficiency Guide:
non_opt_eq([H|T1], <<H,T2/binary>>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.
When compiling with the bin_opt_info option, there will be a
suggestion to change the argument order.
It turns out sys_core_bsm can itself change the order, not the
order of the arguments of themselves, but the order in which
the arguments are matched. Here is how it can be rewritten in
pseudo Core Erlang code:
non_opt_eq(Arg1, Arg2) ->
case < Arg2,Arg1 > of
< <<H1,T2/binary>>, [H2|T1] > when H1 =:= H2 ->
non_opt_eq(T1, T2);
< <<_,T2/binary>ffff>, [_|T1] > ->
false;
< <<>>, [] >> ->
true
end.
When rewritten like this, the bs_start_match2 instruction will be
the first instruction in the function and it will be possible to
store the match context in the same register as the binary
({x,1} in this case) and to delay the creation of sub binaries.
The switching of matching order also enables many other simplifications
in sys_core_bsm, since there is no longer any need to pass the position
of the pattern as an argument.
We will update the Efficiency Guide in a separate branch before the
release of OTP 21.
|
|
|
|
Add some tests cases written when attempting some new optimizations
that turned out to be unsafe.
|
|
Consider a 'case' that exports variables and whose return
value is ignored:
foo(N) ->
case N of
1 ->
Res = one;
2 ->
Res = two
end,
{ok,Res}.
That code will be translated to the following Core Erlang code:
'foo'/1 =
fun (_@c0) ->
let <_@c5,Res> =
case _@c0 of
<1> when 'true' ->
<'one','one'>
<2> when 'true' ->
<'two','two'>
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The exported variables has been rewritten to explicit return
values. Note that the original return value from the 'case' is bound to
the variable _@c5, which is unused.
The corresponding BEAM assembly code looks like this:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,1}}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,1}}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,2}.
{put_tuple,2,{x,0}}.
{put,{atom,ok}}.
{put,{x,1}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
Because of the test_heap instruction following label 5, the assignment
to {x,0} cannot be optimized away by the passes that optimize BEAM assembly
code.
Refactor the optimizations of 'let' in sys_core_fold to eliminate the
unused variable. Thus:
'foo'/1 =
fun (_@c0) ->
let <Res> =
case _@c0 of
<1> when 'true' ->
'one'
<2> when 'true' ->
'two'
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The resulting BEAM code will look like:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,1}.
{put_tuple,2,{x,1}}.
{put,{atom,ok}}.
{put,{x,0}}.
{move,{x,1},{x,0}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
|
|
|
|
The type optimizations for is_record and test_arity checked whether
the arity was equal to the size stored in the type information,
which is incorrect since said size is the *minimum* size of the
tuple (as determined by previous instructions) and not its exact
size.
A future patch to the 'master' branch will restore these
optimizations in a safe manner.
|
|
Delay creation of stack frames
|
|
|
|
|
|
01835845579e9 fixed some problems, but introduced a bug where
is_not_used/3 would report that a register was not used when it
in fact was.
|
|
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
|
|
|
|
We used to not care about the number of values returned from the
'after infinity' clause in a receive (because it could never be
executed). It is time to start caring because this will cause problem
when we will soon start to do some more aggressive optimizizations.
|
|
|
|
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
|