Age | Commit message (Collapse) | Author |
|
A 'case' or 'if' that does not occur last in a function clause will
always force a stack frame. The reasoning behind this is that in most
uses of 'case' there will be a function call from within the
'case'. When there is a function call, the stack frame is needed both
to save the continuation pointer and to save any X registers that will
need to survive the call.
When there is no function call from a 'case', the resulting stack
frame is annoying. There will be register shuffling, and the existence
of the stack frame may thwart many optimizations (for example, in
beam_dead).
Therefore, add an extra pass to v3_codegen to avoid creating a
stack frame when not needed.
https://bugs.erlang.org/browse/ERL-514
|
|
|
|
X register 0 used to be mapped to a hardware register, and therefore
faster than the other registers. Because of that, the compiler
tried to use x(0) as much as possible as a temporary register.
That was changed a few releases ago. X register 0 is now placed
in the array of all X registers and has no special speed
advantage compared to the other registers.
Remove the code in the compiler that attempts to use x(0) as
much as possible. As a result, the following type of instruction
will be much less frequent:
{put_list,Src,{x,0},{x,0}}
Instead, the following type of instruction will be more frequent:
{put_list,Src,{x,X},{x,X}}
(Where X is an arbitrary X register.)
Update the runtime system to specialize that kind of put_list
instruction.
|
|
|
|
The bs_context_to_binary instruction only allows a register operand.
v3_codegen has a test to ensure that the operand is a register.
That test is no longer necessary. There used to be a possibility
that optimizations in sys_core_fold and the inliner could change
the operand for bs_context_to_binary to a binary literal. Since
09112806c15a81b that can no longer happen, because no more
optimizations are run after the introduction of the
bs_context_to_binary instruction.
|
|
The v3_life pass does not do enough to be worth being its own
pass. Essentially it does two things:
* Calculates life-time information starting from the annotations
that v3_kernel provides. That part can be moved into v3_codegen.
* Rewrites the Kernel Erlang records to similar plain tuples
(for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}).
That rewriting is not needed and can be eliminated.
|
|
If a type only has one clause and if the pattern is literal,
the matching can be done more efficiently by directly comparing
with the literal.
Example:
find(String, "") -> String;
find(String, <<>>) -> String;
find(String, SearchPattern) ->
.
.
.
Without this optimization, the relevant part of the code would look
this:
{test,bs_start_match2,{f,3},2,[{x,1},0],{x,2}}.
{test,bs_test_tail2,{f,4},[{x,2},0]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
That is, if {x,1} is a binary, a match context will be built to
test whether {x,1} is an empty binary.
With the optimization, the code will look this:
{test,is_eq_exact,{f,3},[{x,1},{literal,<<>>}]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
|
|
Optimise size calculation for binary construction
OTP-14654
|
|
It turns out it was extremely common to have a following sequence:
move 0 D1
byte_size _ D2
bs_add D1 D2 D
Which is equivalent to just:
byte_size _ D
Similarly another sequence:
move S D1
byte_size _ D2
bs_add D1 D2 D
Can be optimised into:
byte_size _ D2
bs_add S D2 D
Both of those optimisations work with byte_size and bit_size instructions.
|
|
The compiler could sometimes emit unnecessary 'move'
instructions in the code for binary matching, for
example for this function:
escape(<<Byte, Rest/bits>>, Pos) when Byte >= 127 ->
escape(Rest, Pos + 1);
escape(<<Byte, Rest/bits>>, Pos) ->
escape(Rest, Pos + Byte);
escape(<<_Rest/bits>>, Pos) ->
Pos.
The generated code would look like this:
{function, escape, 2, 2}.
{label,1}.
{line,[{location,"t.erl",17}]}.
{func_info,{atom,t},{atom,escape},2}.
{label,2}.
{test,bs_start_match2,{f,1},2,[{x,0},0],{x,0}}.
{test,bs_get_integer2,
{f,4},
2,
[{x,0},
{integer,8},
1,
{field_flags,[{anno,[17,{file,"t.erl"}]},unsigned,big]}],
{x,2}}.
{'%',{bin_opt,[17,{file,"t.erl"}]}}.
{move,{x,0},{x,3}}. %% UNECESSARY!
{test,is_ge,{f,3},[{x,2},{integer,127}]}.
{line,[{location,"t.erl",18}]}.
{gc_bif,'+',{f,0},4,[{x,1},{integer,1}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,3}.
{line,[{location,"t.erl",20}]}.
{gc_bif,'+',{f,0},4,[{x,1},{x,2}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,4}.
{move,{x,1},{x,0}}.
return.
The redundant 'move' instructions have been marked.
To avoid the 'move' instructions, we can extend the existing
function is_context_unused/1 in v3_codegen. If v3_codegen can
know that the match context will not be used again, it can reuse
the register for the match context and avoid the extra 'move'
instructions.
https://bugs.erlang.org/browse/ERL-444
|
|
|
|
The compiler produces poor code for complex guard expressions with andalso/orelse.
Here is an example from the filename module:
-define(IS_DRIVELETTER(Letter),(((Letter >= $A) andalso (Letter =< $Z)) orelse
((Letter >= $a) andalso (Letter =< $z)))).
skip_prefix(Name, false) ->
Name;
skip_prefix([L, DrvSep|Name], DrvSep) when ?IS_DRIVELETTER(L) ->
Name;
skip_prefix(Name, _) ->
Name.
beam_bool fails to simplify the code for the guard, leaving several 'bif'
instructions:
{function, skip_prefix, 2, 49}.
{label,48}.
{line,[{location,"filename.erl",187}]}.
{func_info,{atom,filename},{atom,skip_prefix},2}.
{label,49}.
{test,is_ne_exact,{f,52},[{x,1},{atom,false}]}.
{test,is_nonempty_list,{f,52},[{x,0}]}.
{get_list,{x,0},{x,2},{x,3}}.
{test,is_nonempty_list,{f,52},[{x,3}]}.
{get_list,{x,3},{x,4},{x,5}}.
{bif,'=:=',{f,52},[{x,1},{x,4}],{x,6}}.
{test,is_ge,{f,50},[{x,2},{integer,65}]}.
{bif,'=<',{f,52},[{x,2},{integer,90}],{x,7}}.
{test,is_eq_exact,{f,51},[{x,7},{atom,false}]}.
{test,is_ge,{f,50},[{x,2},{integer,97}]}.
{bif,'=<',{f,52},[{x,2},{integer,122}],{x,7}}.
{jump,{f,51}}.
{label,50}.
{move,{atom,false},{x,7}}.
{label,51}.
{bif,'=:=',{f,52},[{x,7},{atom,true}],{x,7}}.
{test,is_eq_exact,{f,52},[{x,6},{atom,true}]}.
{test,is_eq_exact,{f,52},[{x,7},{atom,true}]}.
{move,{x,5},{x,0}}.
return.
{label,52}.
return.
We can add optimizations of guard tests to v3_kernel to achive a better result:
{function, skip_prefix, 2, 49}.
{label,48}.
{line,[{location,"filename.erl",187}]}.
{func_info,{atom,filename},{atom,skip_prefix},2}.
{label,49}.
{test,is_ne_exact,{f,51},[{x,1},{atom,false}]}.
{test,is_nonempty_list,{f,51},[{x,0}]}.
{get_list,{x,0},{x,2},{x,3}}.
{test,is_nonempty_list,{f,51},[{x,3}]}.
{get_list,{x,3},{x,4},{x,5}}.
{test,is_eq_exact,{f,51},[{x,1},{x,4}]}.
{test,is_ge,{f,51},[{x,2},{integer,65}]}.
{test,is_lt,{f,50},[{integer,90},{x,2}]}.
{test,is_ge,{f,51},[{x,2},{integer,97}]}.
{test,is_ge,{f,51},[{integer,122},{x,2}]}.
{label,50}.
{move,{x,5},{x,0}}.
return.
{label,51}.
return.
Looking at the STDLIB application, there were 112 lines of BIF calls in guards
that beam_bool failed to convert to test instructions. This commit eliminates
all those BIF calls.
Here is how I counted the instructions:
$ PATH=$ERL_TOP/bin:$PATH erlc -I ../include -I ../../kernel/include -S *.erl
$ grep "bif,'[=<>]" *.S | grep -v f,0
dets.S: {bif,'=:=',{f,547},[{x,4},{atom,read_write}],{x,4}}.
dets.S: {bif,'=:=',{f,547},[{x,5},{atom,saved}],{x,5}}.
dets.S: {bif,'=:=',{f,589},[{x,5},{atom,read}],{x,5}}.
.
.
.
$ grep "bif,'[=<>]" *.S | grep -v f,0 | wc
112 224 6765
$
|
|
Do a simpler translation of internal BIFs.
While we are it, also remove the dummy values of Index and Uniq
from the make_fun internal operation.
|
|
beam_validator would complain that x(1) is uninitialized
in a test_heap instruction when attempting to compile
the following code with sys_core_fold turned off:
foo(M) when not (M#{true := 0}); [M] ->
ok.
Simplified, the generated BEAM assembly code looked like
this:
test is_map BadMap x(0)
put_map_exact Fail x(0) => x(1) ...
jump BooleanStuff
BadMap:
move ok => x(1)
jump Fail
BooleanStuff:
...
move Boolean => x(2)
jump Build
Fail:
move false => x(2)
Build:
test_heap 2 3 %% x(0), x(1), x(2) must be live.
...
That is, if put_map_exact failed, control would transfer
to the label Fail without initializing x(1).
Fix that by making sure that x(1) is initilized even if
put_map_exact fails:
test is_map BadMap x(0)
put_map_exact BadLbl x(0) => x(1) ...
jump OkLbl
BadLbl:
move ok => x(1)
jump Fail
OkLbl:
jump BooleanStuff
BadMap:
move ok => x(1)
jump Fail
BooleanStuff:
...
move Boolean => x(2)
jump Build
Fail:
move false => x(2)
Build:
test_heap 2 3 %% x(0), x(1), x(2) must be live.
...
Note that this situation is rare, and that other optimization passes
(beam_dead and beam_jump in particular) will clean up this mess.
|
|
Generate code that not only is safe, but can easily be seen by
beam_validator to be safe.
|
|
Sometimes v3_codegen would generate unsafe code when there was
a call to error/1 in a guard.
|
|
|
|
* maint:
Eliminate crash in v3_codegen
|
|
The following code would crash v3_codegen:
order(From) ->
catch
if
From#{[] => sufficient} ->
saint
end.
Before explaining the crash, first some background on the stack
frame and the Y registers.
Certain instructions, most notably the 'call' instructions, clobber
all X registers. Before any such instruction, all X registers that
have values that will be used after the call must be saved to Y
registers (i.e. to the stack frame). adjust_stack/4 will be called
when X registers must be saved.
There is also another situation when X registers must be saved, namely
within a 'catch' if we are about to execute any instruction that may
cause an exception. Examples of such instructions are some guard BIFs
(such as length/1) and construction of binaries or maps. Within a
'catch', X registers must be be saved because if an exception is
thrown and catched all X registers will be destroyed. The same
adjust_stack/4 function will be called for those instructions, but
only if they occur within a 'catch'.
There is actually one more complication. If there is code in
a guard within a catch, the X registers should not be saved, because
the code in a guard never clobbers any X registers that were alive
before the guard code was entered. v3_codegen is written with the
implicit assumption that code in guards never cause anything
to be saved to Y registers.
The code for building maps and binaries would incorrectly save X
registers within a guard inside a 'catch'.
For construction of binaries, that would mean that a useless but
harmelss 'move' instruction was generated.
But for construction of maps, the saving of the Y register would not
be harmless. There would be a crash when attempting to merge #sr{}
records. #sr{} records keeps track of the contents of X and Y
registers. When two separate code paths are joined (e.g. at the end of
'case' statement), the register descriptors must be reconciled.
Basically, the register descriptors for both paths must be identical.
The #sr{} record for one path must not claim that {y,0} contains
a certain value, while another path claims that {y,0} is dead.
Thus, the crash occurs in sr_merge/2 when failing to reconcile the
Y registers.
To fix this bug this bug we will introduce a new function called
maybe_adjust_stack/5. It will save X registers on the stack only
if the code is inside a catch but not inside a guard. We will
change all existing code to use this new function when appropriate.
Reported-by: Thomas Arts
|
|
Consider the following function:
f(Bin, Bool) ->
case Bin of
<<Val:16/binary,_/binary>> when Bool ->
Val
end.
Simplified, the generated code looks like:
bs_start_match2 Fail Live Bin => Bin
bs_get_integer2 Fail Live Bin size=Sz unit=1 => Val
bs_skip_bits2 Fail Bin size=all unit=8
is_eq_exact Fail Bool true
The code generator will replace the bs_skip_bits2 instruction with
a bs_test_unit instruction if it can be clearly seen that the
context register will not be used again. In this case, it is not
obvious without looking at the code at the Fail label.
However, it turns out that bs_test_unit instruction is always
safe beacuse of the way v3_kernel compiles pattern matching.
It doesn't matter whether the match context will be used again.
If it will be used again, the position in it will *not* be used.
Instead, a bs_restore2 instruction will restore one of the saved
instructions.
|
|
|
|
|
|
Small speed increase for large files.
|
|
|
|
Commit b76588fb5a introduced an optimization of the compile time of
huge functions with many bs_match_string instructions. The
optimization is done in two passes. The first pass coalesces adjacent
bs_match_string instructions. To avoid copying bitstrings multiple
times, the bitstrings in the instructions are combined in to a (deep)
list. The second pass goes through all instructions in the function
and combines the list of bitstrings to a single bitstring in all
bs_match_string instructions.
The second pass (fix_bs_match_string) is run on all instructions in
each function, even if there are no bs_match_instructions in the
function. While fix_bs_match_string is not a bottleneck (it is a
linear pass), its execution time is noticeable when profiling some
modules.
Move the execution of the second pass to the select_binary()
function so that it will only be executed for instructions that
do binary matching. Also take the opportunity to optimize away
uses of bs_restore2 that occour directly after a bs_save2. That
optimimization is currently done in beam_block, but it can be
done essentially for free in the same pass that fixes up
bs_match_string instructions.
|
|
Profiling shows that the execution time for "turning" y registers
is noticeable for some modules (e.g. S1AP-PDU-Contents from the
asn1 test suite). We can reduce the impact on running time by
special-casing important instructions. In particular, there is
no need to look for y registers in the list argument for a
select_val instruction.
|
|
It is no longer necessary to sort the keys, since the loader
does the sorting.
|
|
For unclear reasons, there are two functions in v3_life that are
almost identical: literal/2 and literal2/2. literal/2 is used
for expressions and literal2/2 for patterns.
It turns out that literal2/2 can do everything that literal/2 can
do, except that it transforms maps differently.
If we adjust v3_codegen to accept the same format of maps in
expressions and patterns, we can rename literal2/2 to literal/2
and use it for expressions and patterns.
|
|
v3_codegen puts the compilation in the process dictionary, but
never uses them.
|
|
The put_map_assoc and put_map_exact instructions in the run-time
system will support that the target register is the same as one of
the source registers. Teach the code generator to take advantage
of that.
The disadvantages of not reusing register when possible is that the
garbage collector may retain dead terms longer than necessary.
|
|
Need heap for maps is zero and fall through is also zero.
|
|
|
|
Removed dead code.
|
|
|
|
Multiple 'nil' cannot happen on instruction level.
Map keys are always unique with literals.
|
|
Literal nil values aren't tagged tuple but the bare atom nil.
The function lists:sort/2 expects the passed function to return true if the first
element is less than or equal to the second, not strictly less than. The original
base clause is changed accordingly.
Reported-by: Ulf Norell
|
|
* Combine multiple get values with one instruction
* Combine multiple check keys with one instruction
|
|
This fixes an error on multiple updates optimization for map pairs.
The error was introduced with moving to term order in Maps.
This also fixes an error where register life time was lost for values
and could result in erroneuos values being emitted in for map pairs.
Simplified v3_codegen by moving multiple update optimizations to v3_kernel.
|
|
|
|
|
|
|
|
All pairs in a Map needs to be in strict ascending key order.
|
|
|
|
Can now handle {list [reg()]} elements in instructions.
|
|
To make it possible to build the entire OTP system, also define
dummys for the instructions in ops.tab.
|
|
Reported-by: Stanislav Seletskiy
|
|
* maint:
Fix compiler crash for binary matching and a complicated guard
|
|
The compiler would crash when attempting to compile a function
head that did binary matching and had a complex expression using
'andalso' and 'not'.
Noticed-by: José Valim
|
|
In modules with huge functions with many bs_match_string
instructions, we can speed up the compilation by combining
adjacent bs_match_strings instruction in v3_codegen (as opposed
to in beam_block where we used to do it).
For instance, on my computer the v3_codegen became more than
twice as fast when compiling the re_testoutput1_split_test module
in the STDLIB test suites.
|
|
Add some comments to make the code generation for binary
construction somewhat clearer. Also get rid of the cg_bo_newregs/2
function that serves no useful purpose. (It was probably
intended to undo the effect of cg_live/2, but note that the
value passed to cg_bo_newregs/2 has not passed through cg_live/2.)
|