Age | Commit message (Collapse) | Author |
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|
|
This is an enhancement of the optimization added in 2e5d6201bb044,
where we tried to avoid forcing a stack frame for functions
that don't really need them.
That optimization would not suppress the stack frame for this
function:
f(A) ->
Res = case A of
a -> x;
b -> y
end,
{ok,Res}.
The reason is that internally the compiler would rewrite
the code to something like this:
f(A) ->
Res = case A of
a -> x;
b -> y;
Other -> error({case_clause,Other})
end,
{ok,Res}.
The call to error/1 would force creation of a stack frame,
even though it is not really needed because error/1 causes
an exception.
Handle calls to exit BIFs specially to allow suppressing the
stack frame.
|
|
Delay creation of stack frames
|
|
v3_codegen currently wraps a stack frame around each clause in
a function (unless the clause is simple without any 'case' or
other complex constructions).
Consider this function:
f({a,X}) ->
A = abs(X),
case A of
0 ->
{result,"0"};
_ ->
{result,integer_to_list(A)}
end;
f(_) ->
error.
The first clause needs a stack frame because there is a function
call to integer_to_list/1 not in the tail position. v3_codegen
currently wraps the entire first clause in stack frame.
We can delay the creation of the stack frame, and create a
stack frame in each arm of the 'case' (if needed):
f({a,X}) ->
A = abs(X),
case A of
0 ->
%% Don't create a stack frame here.
{result,"0"};
_ ->
%% Create a stack frame here.
{result,integer_to_list(A)}
end;
f(_) ->
error.
There are pros and cons of this approach.
The cons are that the code size may increase if there are many
'case' clauses and each needs its own stack frame. The allocation
instructions may also interfere with other optimizations, but
the new optimizations introduced in previous commits will mitigate
most of those issues.
The pros are the following:
* For some clauses in a 'case', there is no need to create any
stack frame at all.
* Often when moving an allocation instruction into a 'case' clause,
the slightly cheaper 'allocate' instruction can be used instead
of 'allocate_zero'. There is also the possibility that the
allocate instruction can be be combined with a 'test_heap'
instruction.
* Each stack frame for each arm of the 'case' will have exactly as
many slots as needed.
|
|
758712d6294 changed the need_heap/2 function so that it stopped
using its second argument.
Remove the second argument from need_heap(), and update all callers
to similarly remove unused arguments.
|
|
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
It turns out that we don't need to keep track of locked
variables, because the locked variables are always the same
variables that will be alive after a #k_guard_break{}.
|
|
Remove handling of #k_match{} in bsm_rename_ctx/4.
It can never be reached because bsm_rename_ctx/4 will never recurse
into a block that is not in the scope of a #k_protected{}, and
in a #k_protected{}, #k_match{} is not allowed.
|
|
Put guard_cg_list/6 directly after guard_cg/5.
|
|
The function guard_cg/5 handles constructs found within
the records #k_guard_clause{] and #k_protected{}.
Since #k_guard_clause{} can only contain a #k_protected{},
and #k_protected{} in turn cannot contain a #cg_block{},
the clause for handling #cg_block{} in guard_cg/5 is never
executed and can be removed.
|
|
The variable being added will already be there (added by v3_kernel).
|
|
A 'case' or 'if' that does not occur last in a function clause will
always force a stack frame. The reasoning behind this is that in most
uses of 'case' there will be a function call from within the
'case'. When there is a function call, the stack frame is needed both
to save the continuation pointer and to save any X registers that will
need to survive the call.
When there is no function call from a 'case', the resulting stack
frame is annoying. There will be register shuffling, and the existence
of the stack frame may thwart many optimizations (for example, in
beam_dead).
Therefore, add an extra pass to v3_codegen to avoid creating a
stack frame when not needed.
https://bugs.erlang.org/browse/ERL-514
|
|
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
X register 0 used to be mapped to a hardware register, and therefore
faster than the other registers. Because of that, the compiler
tried to use x(0) as much as possible as a temporary register.
That was changed a few releases ago. X register 0 is now placed
in the array of all X registers and has no special speed
advantage compared to the other registers.
Remove the code in the compiler that attempts to use x(0) as
much as possible. As a result, the following type of instruction
will be much less frequent:
{put_list,Src,{x,0},{x,0}}
Instead, the following type of instruction will be more frequent:
{put_list,Src,{x,X},{x,X}}
(Where X is an arbitrary X register.)
Update the runtime system to specialize that kind of put_list
instruction.
|
|
|
|
The bs_context_to_binary instruction only allows a register operand.
v3_codegen has a test to ensure that the operand is a register.
That test is no longer necessary. There used to be a possibility
that optimizations in sys_core_fold and the inliner could change
the operand for bs_context_to_binary to a binary literal. Since
09112806c15a81b that can no longer happen, because no more
optimizations are run after the introduction of the
bs_context_to_binary instruction.
|
|
The v3_life pass does not do enough to be worth being its own
pass. Essentially it does two things:
* Calculates life-time information starting from the annotations
that v3_kernel provides. That part can be moved into v3_codegen.
* Rewrites the Kernel Erlang records to similar plain tuples
(for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}).
That rewriting is not needed and can be eliminated.
|
|
If a type only has one clause and if the pattern is literal,
the matching can be done more efficiently by directly comparing
with the literal.
Example:
find(String, "") -> String;
find(String, <<>>) -> String;
find(String, SearchPattern) ->
.
.
.
Without this optimization, the relevant part of the code would look
this:
{test,bs_start_match2,{f,3},2,[{x,1},0],{x,2}}.
{test,bs_test_tail2,{f,4},[{x,2},0]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
That is, if {x,1} is a binary, a match context will be built to
test whether {x,1} is an empty binary.
With the optimization, the code will look this:
{test,is_eq_exact,{f,3},[{x,1},{literal,<<>>}]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
|
|
Optimise size calculation for binary construction
OTP-14654
|
|
It turns out it was extremely common to have a following sequence:
move 0 D1
byte_size _ D2
bs_add D1 D2 D
Which is equivalent to just:
byte_size _ D
Similarly another sequence:
move S D1
byte_size _ D2
bs_add D1 D2 D
Can be optimised into:
byte_size _ D2
bs_add S D2 D
Both of those optimisations work with byte_size and bit_size instructions.
|
|
The compiler could sometimes emit unnecessary 'move'
instructions in the code for binary matching, for
example for this function:
escape(<<Byte, Rest/bits>>, Pos) when Byte >= 127 ->
escape(Rest, Pos + 1);
escape(<<Byte, Rest/bits>>, Pos) ->
escape(Rest, Pos + Byte);
escape(<<_Rest/bits>>, Pos) ->
Pos.
The generated code would look like this:
{function, escape, 2, 2}.
{label,1}.
{line,[{location,"t.erl",17}]}.
{func_info,{atom,t},{atom,escape},2}.
{label,2}.
{test,bs_start_match2,{f,1},2,[{x,0},0],{x,0}}.
{test,bs_get_integer2,
{f,4},
2,
[{x,0},
{integer,8},
1,
{field_flags,[{anno,[17,{file,"t.erl"}]},unsigned,big]}],
{x,2}}.
{'%',{bin_opt,[17,{file,"t.erl"}]}}.
{move,{x,0},{x,3}}. %% UNECESSARY!
{test,is_ge,{f,3},[{x,2},{integer,127}]}.
{line,[{location,"t.erl",18}]}.
{gc_bif,'+',{f,0},4,[{x,1},{integer,1}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,3}.
{line,[{location,"t.erl",20}]}.
{gc_bif,'+',{f,0},4,[{x,1},{x,2}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,4}.
{move,{x,1},{x,0}}.
return.
The redundant 'move' instructions have been marked.
To avoid the 'move' instructions, we can extend the existing
function is_context_unused/1 in v3_codegen. If v3_codegen can
know that the match context will not be used again, it can reuse
the register for the match context and avoid the extra 'move'
instructions.
https://bugs.erlang.org/browse/ERL-444
|
|
|
|
The compiler produces poor code for complex guard expressions with andalso/orelse.
Here is an example from the filename module:
-define(IS_DRIVELETTER(Letter),(((Letter >= $A) andalso (Letter =< $Z)) orelse
((Letter >= $a) andalso (Letter =< $z)))).
skip_prefix(Name, false) ->
Name;
skip_prefix([L, DrvSep|Name], DrvSep) when ?IS_DRIVELETTER(L) ->
Name;
skip_prefix(Name, _) ->
Name.
beam_bool fails to simplify the code for the guard, leaving several 'bif'
instructions:
{function, skip_prefix, 2, 49}.
{label,48}.
{line,[{location,"filename.erl",187}]}.
{func_info,{atom,filename},{atom,skip_prefix},2}.
{label,49}.
{test,is_ne_exact,{f,52},[{x,1},{atom,false}]}.
{test,is_nonempty_list,{f,52},[{x,0}]}.
{get_list,{x,0},{x,2},{x,3}}.
{test,is_nonempty_list,{f,52},[{x,3}]}.
{get_list,{x,3},{x,4},{x,5}}.
{bif,'=:=',{f,52},[{x,1},{x,4}],{x,6}}.
{test,is_ge,{f,50},[{x,2},{integer,65}]}.
{bif,'=<',{f,52},[{x,2},{integer,90}],{x,7}}.
{test,is_eq_exact,{f,51},[{x,7},{atom,false}]}.
{test,is_ge,{f,50},[{x,2},{integer,97}]}.
{bif,'=<',{f,52},[{x,2},{integer,122}],{x,7}}.
{jump,{f,51}}.
{label,50}.
{move,{atom,false},{x,7}}.
{label,51}.
{bif,'=:=',{f,52},[{x,7},{atom,true}],{x,7}}.
{test,is_eq_exact,{f,52},[{x,6},{atom,true}]}.
{test,is_eq_exact,{f,52},[{x,7},{atom,true}]}.
{move,{x,5},{x,0}}.
return.
{label,52}.
return.
We can add optimizations of guard tests to v3_kernel to achive a better result:
{function, skip_prefix, 2, 49}.
{label,48}.
{line,[{location,"filename.erl",187}]}.
{func_info,{atom,filename},{atom,skip_prefix},2}.
{label,49}.
{test,is_ne_exact,{f,51},[{x,1},{atom,false}]}.
{test,is_nonempty_list,{f,51},[{x,0}]}.
{get_list,{x,0},{x,2},{x,3}}.
{test,is_nonempty_list,{f,51},[{x,3}]}.
{get_list,{x,3},{x,4},{x,5}}.
{test,is_eq_exact,{f,51},[{x,1},{x,4}]}.
{test,is_ge,{f,51},[{x,2},{integer,65}]}.
{test,is_lt,{f,50},[{integer,90},{x,2}]}.
{test,is_ge,{f,51},[{x,2},{integer,97}]}.
{test,is_ge,{f,51},[{integer,122},{x,2}]}.
{label,50}.
{move,{x,5},{x,0}}.
return.
{label,51}.
return.
Looking at the STDLIB application, there were 112 lines of BIF calls in guards
that beam_bool failed to convert to test instructions. This commit eliminates
all those BIF calls.
Here is how I counted the instructions:
$ PATH=$ERL_TOP/bin:$PATH erlc -I ../include -I ../../kernel/include -S *.erl
$ grep "bif,'[=<>]" *.S | grep -v f,0
dets.S: {bif,'=:=',{f,547},[{x,4},{atom,read_write}],{x,4}}.
dets.S: {bif,'=:=',{f,547},[{x,5},{atom,saved}],{x,5}}.
dets.S: {bif,'=:=',{f,589},[{x,5},{atom,read}],{x,5}}.
.
.
.
$ grep "bif,'[=<>]" *.S | grep -v f,0 | wc
112 224 6765
$
|
|
Do a simpler translation of internal BIFs.
While we are it, also remove the dummy values of Index and Uniq
from the make_fun internal operation.
|
|
beam_validator would complain that x(1) is uninitialized
in a test_heap instruction when attempting to compile
the following code with sys_core_fold turned off:
foo(M) when not (M#{true := 0}); [M] ->
ok.
Simplified, the generated BEAM assembly code looked like
this:
test is_map BadMap x(0)
put_map_exact Fail x(0) => x(1) ...
jump BooleanStuff
BadMap:
move ok => x(1)
jump Fail
BooleanStuff:
...
move Boolean => x(2)
jump Build
Fail:
move false => x(2)
Build:
test_heap 2 3 %% x(0), x(1), x(2) must be live.
...
That is, if put_map_exact failed, control would transfer
to the label Fail without initializing x(1).
Fix that by making sure that x(1) is initilized even if
put_map_exact fails:
test is_map BadMap x(0)
put_map_exact BadLbl x(0) => x(1) ...
jump OkLbl
BadLbl:
move ok => x(1)
jump Fail
OkLbl:
jump BooleanStuff
BadMap:
move ok => x(1)
jump Fail
BooleanStuff:
...
move Boolean => x(2)
jump Build
Fail:
move false => x(2)
Build:
test_heap 2 3 %% x(0), x(1), x(2) must be live.
...
Note that this situation is rare, and that other optimization passes
(beam_dead and beam_jump in particular) will clean up this mess.
|
|
Generate code that not only is safe, but can easily be seen by
beam_validator to be safe.
|
|
Sometimes v3_codegen would generate unsafe code when there was
a call to error/1 in a guard.
|
|
|
|
* maint:
Eliminate crash in v3_codegen
|
|
The following code would crash v3_codegen:
order(From) ->
catch
if
From#{[] => sufficient} ->
saint
end.
Before explaining the crash, first some background on the stack
frame and the Y registers.
Certain instructions, most notably the 'call' instructions, clobber
all X registers. Before any such instruction, all X registers that
have values that will be used after the call must be saved to Y
registers (i.e. to the stack frame). adjust_stack/4 will be called
when X registers must be saved.
There is also another situation when X registers must be saved, namely
within a 'catch' if we are about to execute any instruction that may
cause an exception. Examples of such instructions are some guard BIFs
(such as length/1) and construction of binaries or maps. Within a
'catch', X registers must be be saved because if an exception is
thrown and catched all X registers will be destroyed. The same
adjust_stack/4 function will be called for those instructions, but
only if they occur within a 'catch'.
There is actually one more complication. If there is code in
a guard within a catch, the X registers should not be saved, because
the code in a guard never clobbers any X registers that were alive
before the guard code was entered. v3_codegen is written with the
implicit assumption that code in guards never cause anything
to be saved to Y registers.
The code for building maps and binaries would incorrectly save X
registers within a guard inside a 'catch'.
For construction of binaries, that would mean that a useless but
harmelss 'move' instruction was generated.
But for construction of maps, the saving of the Y register would not
be harmless. There would be a crash when attempting to merge #sr{}
records. #sr{} records keeps track of the contents of X and Y
registers. When two separate code paths are joined (e.g. at the end of
'case' statement), the register descriptors must be reconciled.
Basically, the register descriptors for both paths must be identical.
The #sr{} record for one path must not claim that {y,0} contains
a certain value, while another path claims that {y,0} is dead.
Thus, the crash occurs in sr_merge/2 when failing to reconcile the
Y registers.
To fix this bug this bug we will introduce a new function called
maybe_adjust_stack/5. It will save X registers on the stack only
if the code is inside a catch but not inside a guard. We will
change all existing code to use this new function when appropriate.
Reported-by: Thomas Arts
|
|
Consider the following function:
f(Bin, Bool) ->
case Bin of
<<Val:16/binary,_/binary>> when Bool ->
Val
end.
Simplified, the generated code looks like:
bs_start_match2 Fail Live Bin => Bin
bs_get_integer2 Fail Live Bin size=Sz unit=1 => Val
bs_skip_bits2 Fail Bin size=all unit=8
is_eq_exact Fail Bool true
The code generator will replace the bs_skip_bits2 instruction with
a bs_test_unit instruction if it can be clearly seen that the
context register will not be used again. In this case, it is not
obvious without looking at the code at the Fail label.
However, it turns out that bs_test_unit instruction is always
safe beacuse of the way v3_kernel compiles pattern matching.
It doesn't matter whether the match context will be used again.
If it will be used again, the position in it will *not* be used.
Instead, a bs_restore2 instruction will restore one of the saved
instructions.
|
|
|
|
|
|
Small speed increase for large files.
|
|
|
|
Commit b76588fb5a introduced an optimization of the compile time of
huge functions with many bs_match_string instructions. The
optimization is done in two passes. The first pass coalesces adjacent
bs_match_string instructions. To avoid copying bitstrings multiple
times, the bitstrings in the instructions are combined in to a (deep)
list. The second pass goes through all instructions in the function
and combines the list of bitstrings to a single bitstring in all
bs_match_string instructions.
The second pass (fix_bs_match_string) is run on all instructions in
each function, even if there are no bs_match_instructions in the
function. While fix_bs_match_string is not a bottleneck (it is a
linear pass), its execution time is noticeable when profiling some
modules.
Move the execution of the second pass to the select_binary()
function so that it will only be executed for instructions that
do binary matching. Also take the opportunity to optimize away
uses of bs_restore2 that occour directly after a bs_save2. That
optimimization is currently done in beam_block, but it can be
done essentially for free in the same pass that fixes up
bs_match_string instructions.
|
|
Profiling shows that the execution time for "turning" y registers
is noticeable for some modules (e.g. S1AP-PDU-Contents from the
asn1 test suite). We can reduce the impact on running time by
special-casing important instructions. In particular, there is
no need to look for y registers in the list argument for a
select_val instruction.
|
|
It is no longer necessary to sort the keys, since the loader
does the sorting.
|
|
For unclear reasons, there are two functions in v3_life that are
almost identical: literal/2 and literal2/2. literal/2 is used
for expressions and literal2/2 for patterns.
It turns out that literal2/2 can do everything that literal/2 can
do, except that it transforms maps differently.
If we adjust v3_codegen to accept the same format of maps in
expressions and patterns, we can rename literal2/2 to literal/2
and use it for expressions and patterns.
|
|
v3_codegen puts the compilation in the process dictionary, but
never uses them.
|
|
The put_map_assoc and put_map_exact instructions in the run-time
system will support that the target register is the same as one of
the source registers. Teach the code generator to take advantage
of that.
The disadvantages of not reusing register when possible is that the
garbage collector may retain dead terms longer than necessary.
|
|
Need heap for maps is zero and fall through is also zero.
|
|
|
|
Removed dead code.
|
|
|
|
Multiple 'nil' cannot happen on instruction level.
Map keys are always unique with literals.
|
|
Literal nil values aren't tagged tuple but the bare atom nil.
The function lists:sort/2 expects the passed function to return true if the first
element is less than or equal to the second, not strictly less than. The original
base clause is changed accordingly.
Reported-by: Ulf Norell
|
|
* Combine multiple get values with one instruction
* Combine multiple check keys with one instruction
|