Age | Commit message (Collapse) | Author |
|
This is an enhancement of the optimization added in 2e5d6201bb044,
where we tried to avoid forcing a stack frame for functions
that don't really need them.
That optimization would not suppress the stack frame for this
function:
f(A) ->
Res = case A of
a -> x;
b -> y
end,
{ok,Res}.
The reason is that internally the compiler would rewrite
the code to something like this:
f(A) ->
Res = case A of
a -> x;
b -> y;
Other -> error({case_clause,Other})
end,
{ok,Res}.
The call to error/1 would force creation of a stack frame,
even though it is not really needed because error/1 causes
an exception.
Handle calls to exit BIFs specially to allow suppressing the
stack frame.
|
|
Delay creation of stack frames
|
|
* bjorn/compiler/coverage:
beam_utils: Refactor combine_alloc_lists() to cover more lines
map_SUITE: Cover beam_utils:bif_to_test/3
beam_disasm: Remove support for obsolete instructions
guard_SUITE: Test is_bitstring/1 and is_map/1 on literals
|
|
v3_codegen currently wraps a stack frame around each clause in
a function (unless the clause is simple without any 'case' or
other complex constructions).
Consider this function:
f({a,X}) ->
A = abs(X),
case A of
0 ->
{result,"0"};
_ ->
{result,integer_to_list(A)}
end;
f(_) ->
error.
The first clause needs a stack frame because there is a function
call to integer_to_list/1 not in the tail position. v3_codegen
currently wraps the entire first clause in stack frame.
We can delay the creation of the stack frame, and create a
stack frame in each arm of the 'case' (if needed):
f({a,X}) ->
A = abs(X),
case A of
0 ->
%% Don't create a stack frame here.
{result,"0"};
_ ->
%% Create a stack frame here.
{result,integer_to_list(A)}
end;
f(_) ->
error.
There are pros and cons of this approach.
The cons are that the code size may increase if there are many
'case' clauses and each needs its own stack frame. The allocation
instructions may also interfere with other optimizations, but
the new optimizations introduced in previous commits will mitigate
most of those issues.
The pros are the following:
* For some clauses in a 'case', there is no need to create any
stack frame at all.
* Often when moving an allocation instruction into a 'case' clause,
the slightly cheaper 'allocate' instruction can be used instead
of 'allocate_zero'. There is also the possibility that the
allocate instruction can be be combined with a 'test_heap'
instruction.
* Each stack frame for each arm of the 'case' will have exactly as
many slots as needed.
|
|
When rewriting tuple matching of the first element of a tuple to an
is_tagged_tuple instruction, the get_tuple_element instruction that
fetches the tag will be left unless the register that is fetched is
subsequently killed.
We can do better than that. If the register is referenced in an
allocating instruction, but its value is never actually used, we
can do one of two things: if the value is known to be defined earlier
(using annotations added by beam_utils:anno_defs/1) the instruction
can be removed altogether; if not, it can be replaced with a
'move nil TagRegister' instruction.
|
|
Use annotations added by beam_utils:anno_defs/1 to move more
allocations upwards in the instruction stream. That in turn
allows us to optimize away more 'move' instructions.
|
|
To avoid having to call both is_killed/3 and is_not_used/3,
add usage/3 to answer both questions in one call.
|
|
Add beam_utils:anno_defs/1 which will add an annotation to the
beginning of each block indicating which X registers that are
defined. Having that information can improve some optimizations.
|
|
|
|
Add -MMD option to erlc
OTP-14830
|
|
There are four uncovered lines in combine_heap_needs/2 and
combine_alloc_lists/2. There is no way to reach starting from
Erlang source code using the standard compiler. However, they
can be reached starting from BEAM assembly code, so we don't
want to remove them.
We could add a test case that covers the lines using assembly
code, but an easier solution is to rewrite the code in a more
generic way using sofs so that the code can be covered with
existing test cases.
|
|
|
|
* bjorn/compiler/use-stacktrace-syntax:
Use the new syntax for retrieving stack traces
|
|
01835845579e9 fixed some problems, but introduced a bug where
is_not_used/3 would report that a register was not used when it
in fact was.
|
|
|
|
758712d6294 changed the need_heap/2 function so that it stopped
using its second argument.
Remove the second argument from need_heap(), and update all callers
to similarly remove unused arguments.
|
|
|
|
|
|
|
|
|
|
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
It turns out that we don't need to keep track of locked
variables, because the locked variables are always the same
variables that will be alive after a #k_guard_break{}.
|
|
Remove handling of #k_match{} in bsm_rename_ctx/4.
It can never be reached because bsm_rename_ctx/4 will never recurse
into a block that is not in the scope of a #k_protected{}, and
in a #k_protected{}, #k_match{} is not allowed.
|
|
|
|
Put guard_cg_list/6 directly after guard_cg/5.
|
|
The function guard_cg/5 handles constructs found within
the records #k_guard_clause{] and #k_protected{}.
Since #k_guard_clause{} can only contain a #k_protected{},
and #k_protected{} in turn cannot contain a #cg_block{},
the clause for handling #cg_block{} in guard_cg/5 is never
executed and can be removed.
|
|
The variable being added will already be there (added by v3_kernel).
|
|
|
|
When converting a comparison BIF (such as '=:=') to a test
instruction, run the other optimizations on the result.
When trying to combine is_eq_exact tests, handle the case
that is_eq_exact is followed by a jump instead of a label
to handle a test that has been newly converted from a BIF.
Taken together, those changes will coalesce more is_eq_exact
instructions into select_val instructions.
|
|
A 'case' or 'if' that does not occur last in a function clause will
always force a stack frame. The reasoning behind this is that in most
uses of 'case' there will be a function call from within the
'case'. When there is a function call, the stack frame is needed both
to save the continuation pointer and to save any X registers that will
need to survive the call.
When there is no function call from a 'case', the resulting stack
frame is annoying. There will be register shuffling, and the existence
of the stack frame may thwart many optimizations (for example, in
beam_dead).
Therefore, add an extra pass to v3_codegen to avoid creating a
stack frame when not needed.
https://bugs.erlang.org/browse/ERL-514
|
|
The compile option makedep_side_effect, erlc -MMD, instructs
the compiler to emit dependencies and continue to compile
as normal.
|
|
v3_kernel could generate a #k_break{} with only one variable, even if
the preceding code and succeding code expected more than one value. It
happened to work anyway because the value returned from the break was
not actually used.
|
|
We used to not care about the number of values returned from the
'after infinity' clause in a receive (because it could never be
executed). It is time to start caring because this will cause problem
when we will soon start to do some more aggressive optimizizations.
|
|
|
|
|
|
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
X register 0 used to be mapped to a hardware register, and therefore
faster than the other registers. Because of that, the compiler
tried to use x(0) as much as possible as a temporary register.
That was changed a few releases ago. X register 0 is now placed
in the array of all X registers and has no special speed
advantage compared to the other registers.
Remove the code in the compiler that attempts to use x(0) as
much as possible. As a result, the following type of instruction
will be much less frequent:
{put_list,Src,{x,0},{x,0}}
Instead, the following type of instruction will be more frequent:
{put_list,Src,{x,X},{x,X}}
(Where X is an arbitrary X register.)
Update the runtime system to specialize that kind of put_list
instruction.
|
|
|
|
The bs_context_to_binary instruction only allows a register operand.
v3_codegen has a test to ensure that the operand is a register.
That test is no longer necessary. There used to be a possibility
that optimizations in sys_core_fold and the inliner could change
the operand for bs_context_to_binary to a binary literal. Since
09112806c15a81b that can no longer happen, because no more
optimizations are run after the introduction of the
bs_context_to_binary instruction.
|
|
This clause seems to have been introduced in cac51274eb9a.
|
|
The clause that converted an iolist to a binary was never
executed.
Note that chunk/2 is called for all chunks in the
{extra_chunks,Chunks} option. This change will enforce that the
contents of each chunk must be a binary (as documented).
|
|
|
|
f9a323d10a9f5d added consistent operand order for equality
comparisons. As a result, beam_dead:turn_op/1 is no longer covered.
We must keep the uncovered lines in beam_dead to ensure that
beam_dead can handle BEAM assembly code from another source than
v3_codegen that might not follow the operand order convention.
The only way to cover the lines is to use BEAM assembly in
the test case.
|
|
beam_utils:bif_to_test/3 is supposed to never put a literal
operand as the first operand in is_eq_exact or is_ne_exact,
but 'nil' was not recognized as a literal.
|
|
|
|
The uncovered line was added in 6753bbcc3fdb0.
|
|
Place move S x0 instructions at the end of blocks
|
|
The loader has a lot of fused instructions that include move S x0.
Placing them at the end of blocks makes it possible to take advantage
of this optimization more frequently.
|