Age | Commit message (Collapse) | Author |
|
A 'case' or 'if' that does not occur last in a function clause will
always force a stack frame. The reasoning behind this is that in most
uses of 'case' there will be a function call from within the
'case'. When there is a function call, the stack frame is needed both
to save the continuation pointer and to save any X registers that will
need to survive the call.
When there is no function call from a 'case', the resulting stack
frame is annoying. There will be register shuffling, and the existence
of the stack frame may thwart many optimizations (for example, in
beam_dead).
Therefore, add an extra pass to v3_codegen to avoid creating a
stack frame when not needed.
https://bugs.erlang.org/browse/ERL-514
|
|
v3_kernel could generate a #k_break{} with only one variable, even if
the preceding code and succeding code expected more than one value. It
happened to work anyway because the value returned from the break was
not actually used.
|
|
We used to not care about the number of values returned from the
'after infinity' clause in a receive (because it could never be
executed). It is time to start caring because this will cause problem
when we will soon start to do some more aggressive optimizizations.
|
|
|
|
X register 0 used to be mapped to a hardware register, and therefore
faster than the other registers. Because of that, the compiler
tried to use x(0) as much as possible as a temporary register.
That was changed a few releases ago. X register 0 is now placed
in the array of all X registers and has no special speed
advantage compared to the other registers.
Remove the code in the compiler that attempts to use x(0) as
much as possible. As a result, the following type of instruction
will be much less frequent:
{put_list,Src,{x,0},{x,0}}
Instead, the following type of instruction will be more frequent:
{put_list,Src,{x,X},{x,X}}
(Where X is an arbitrary X register.)
Update the runtime system to specialize that kind of put_list
instruction.
|
|
|
|
The bs_context_to_binary instruction only allows a register operand.
v3_codegen has a test to ensure that the operand is a register.
That test is no longer necessary. There used to be a possibility
that optimizations in sys_core_fold and the inliner could change
the operand for bs_context_to_binary to a binary literal. Since
09112806c15a81b that can no longer happen, because no more
optimizations are run after the introduction of the
bs_context_to_binary instruction.
|
|
This clause seems to have been introduced in cac51274eb9a.
|
|
The clause that converted an iolist to a binary was never
executed.
Note that chunk/2 is called for all chunks in the
{extra_chunks,Chunks} option. This change will enforce that the
contents of each chunk must be a binary (as documented).
|
|
beam_utils:bif_to_test/3 is supposed to never put a literal
operand as the first operand in is_eq_exact or is_ne_exact,
but 'nil' was not recognized as a literal.
|
|
Place move S x0 instructions at the end of blocks
|
|
The loader has a lot of fused instructions that include move S x0.
Placing them at the end of blocks makes it possible to take advantage
of this optimization more frequently.
|
|
d8d07a7593d811 that added the to_dis option to the compiler no longer
works when merged to master. That is because of 79f28cfd8df1b7
that removed some unused code in erts_debug.
Fix this by adding a new function erts_debug:dis_to_file/2 and
use it from compile module where we have access to the filename
(in beam_listing we only have an opened file).
|
|
Conflicts:
lib/compiler/src/beam_listing.erl
|
|
* lukas/compiler/add_to_dis/OTP-14784:
compiler: Add +to_dis option that dumps loaded asm
|
|
|
|
* maint:
Recognize 'deterministic' when given in a -compile() attribute
Conflicts:
lib/compiler/src/beam_asm.erl
|
|
The compiler option 'deterministic' was only recognized when given
as an option to the compiler, not when it was specified in a
-compile() attribute in the source file.
https://bugs.erlang.org/browse/ERL-498
|
|
Add some internal documentation about cerl_clauses
|
|
The v3_life pass does not do enough to be worth being its own
pass. Essentially it does two things:
* Calculates life-time information starting from the annotations
that v3_kernel provides. That part can be moved into v3_codegen.
* Rewrites the Kernel Erlang records to similar plain tuples
(for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}).
That rewriting is not needed and can be eliminated.
|
|
I recently tried to add some additional optimizations for
matching of maps, but found out that the inliner will need
some updates to be able to handle those optimizations.
Add lib/compiler/internal_doc/cerl-notes.md to document what
I've learned.
|
|
If a type only has one clause and if the pattern is literal,
the matching can be done more efficiently by directly comparing
with the literal.
Example:
find(String, "") -> String;
find(String, <<>>) -> String;
find(String, SearchPattern) ->
.
.
.
Without this optimization, the relevant part of the code would look
this:
{test,bs_start_match2,{f,3},2,[{x,1},0],{x,2}}.
{test,bs_test_tail2,{f,4},[{x,2},0]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
That is, if {x,1} is a binary, a match context will be built to
test whether {x,1} is an empty binary.
With the optimization, the code will look this:
{test,is_eq_exact,{f,3},[{x,1},{literal,<<>>}]}.
return.
{label,3}.
{test,is_nil,{f,4},[{x,1}]}.
return.
{label,4}.
.
.
.
|
|
Rewrite a catch expression like this:
catch side_effect(),
...
to:
try
side_effect()
catch
_:_ ->
ok
end,
...
A try/catch is more efficient since no stack trace will be built
when an exception occurs.
|
|
Improve the receive optimization to be able to handle code
such as this:
Ref = make_ref(),
try
side_effect()
catch _:_ ->
ok
end,
receive
%% Clauses that all match Ref.
.
.
.
end
|
|
If a try/catch is used to ignore the potential exception caused by some
expression with a side effect such as:
try
side_effect()
catch _Class:_Reason ->
ok
end,
.
.
.
the generated code will be wasteful:
try YReg TryLabel
%% call side_effect() here
try_end Yreg
jump GoodLabel
TryLabel:
try_case YReg
%% try_case has set up registers as follows:
%% x(0) -> error | exit | throw
%% x(1) -> reason
%% x(2) -> raw stack trace data
GoodLabel:
%% This code does not use any x registers.
There will be two code paths that both end up at GoodLabel, and
the try_case instruction will set up registers that are never used.
We can do better by replacing try_case with try_end to obtain this
code:
try YReg TryLabel
%% call side_effect() here
try_end Yreg
jump GoodLabel
TryLabel:
try_end YReg
GoodLabel:
The jump optimizer (beam_jump) can further optimize this code
like this:
try YReg TryLabel
%% call side_effect() here
TryLabel:
try_end YReg
|
|
Optimise size calculation for binary construction
OTP-14654
|
|
* maint:
Fix incorrect internal consistency failure for binary matching code
|
|
4c31fd0b9665 made the merging of match contexts stricter;
in fact, a little bit too strict.
Two match contexts with different number of slots would
be downgraded to the 'term' type. The correct way is to
keep the match context but set the number of slots to the
lowest number of slots of the two match contexts.
https://bugs.erlang.org/browse/ERL-490
|
|
* siri/string-new-api: (28 commits)
hipe (test): Do not use deprecated functions in string(3)
dialyzer (test): Do not use deprecated functions in string(3)
eunit (test): Do not use deprecated functions in string(3)
system (test): Do not use deprecated functions in string(3)
system (test): Do not use deprecated functions in string(3)
mnesia (test): Do not use deprecated functions in string(3)
Deprecate old string functions
observer: Do not use deprecated functions in string(3)
common_test: Do not use deprecated functions in string(3)
eldap: Do not use deprecated functions in string(3)
et: Do not use deprecated functions in string(3)
os_mon: Do not use deprecated functions in string(3)
debugger: Do not use deprecated functions in string(3)
runtime_tools: Do not use deprecated functions in string(3)
asn1: Do not use deprecated functions in string(3)
compiler: Do not use deprecated functions in string(3)
sasl: Do not use deprecated functions in string(3)
reltool: Do not use deprecated functions in string(3)
kernel: Do not use deprecated functions in string(3)
hipe: Do not use deprecated functions in string(3)
...
Conflicts:
lib/eunit/src/eunit_lib.erl
lib/observer/src/crashdump_viewer.erl
lib/reltool/src/reltool_target.erl
|
|
|
|
Add compile_info option to compile
OTP-14615
|
|
This allows compilers built on top of the compile module
to attach external compilation metadata to the compile_info
chunk.
For example, Erlang uses this chunk to store the compiler
version. Elixir and LFE may augment this by also adding
their own compiler versions, which can be useful when
debugging.
The deterministic option does not affect the user supplied
compile_info. It is therefore the responsibility of external
compilers to guarantee any added information does not violate
the determinsitic option, if such option is supported.
Finally, this code moves the building of the compile_info
options to the compile module instead of beam_asm, moving
all of the option mangling code to a single place.
|
|
Optimise equality comparisons
|
|
It turns out it was extremely common to have a following sequence:
move 0 D1
byte_size _ D2
bs_add D1 D2 D
Which is equivalent to just:
byte_size _ D
Similarly another sequence:
move S D1
byte_size _ D2
bs_add D1 D2 D
Can be optimised into:
byte_size _ D2
bs_add S D2 D
Both of those optimisations work with byte_size and bit_size instructions.
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
The compiler could sometimes emit unnecessary 'move'
instructions in the code for binary matching, for
example for this function:
escape(<<Byte, Rest/bits>>, Pos) when Byte >= 127 ->
escape(Rest, Pos + 1);
escape(<<Byte, Rest/bits>>, Pos) ->
escape(Rest, Pos + Byte);
escape(<<_Rest/bits>>, Pos) ->
Pos.
The generated code would look like this:
{function, escape, 2, 2}.
{label,1}.
{line,[{location,"t.erl",17}]}.
{func_info,{atom,t},{atom,escape},2}.
{label,2}.
{test,bs_start_match2,{f,1},2,[{x,0},0],{x,0}}.
{test,bs_get_integer2,
{f,4},
2,
[{x,0},
{integer,8},
1,
{field_flags,[{anno,[17,{file,"t.erl"}]},unsigned,big]}],
{x,2}}.
{'%',{bin_opt,[17,{file,"t.erl"}]}}.
{move,{x,0},{x,3}}. %% UNECESSARY!
{test,is_ge,{f,3},[{x,2},{integer,127}]}.
{line,[{location,"t.erl",18}]}.
{gc_bif,'+',{f,0},4,[{x,1},{integer,1}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,3}.
{line,[{location,"t.erl",20}]}.
{gc_bif,'+',{f,0},4,[{x,1},{x,2}],{x,1}}.
{move,{x,3},{x,0}}. %% UNECESSARY!
{call_only,2,{f,2}}.
{label,4}.
{move,{x,1},{x,0}}.
return.
The redundant 'move' instructions have been marked.
To avoid the 'move' instructions, we can extend the existing
function is_context_unused/1 in v3_codegen. If v3_codegen can
know that the match context will not be used again, it can reuse
the register for the match context and avoid the extra 'move'
instructions.
https://bugs.erlang.org/browse/ERL-444
|
|
* maint:
Make handling of match contexts stricter
|
|
Enhance optimisations in beam_peep
|
|
beam_validator could fail issue a diagnostic when a register
that was supposed to be a match context was not guaranteed to
be a match context.
The bug was in merging of types. Merging of a match context with
another term would result in a match context. That is wrong. Merging
should produce a more general type, not a narrower type. Also, the
valid slots in two match contexts should be combined with 'band', not
'bor'.
|
|
When cleaning selects, it might happen we're left with only one pair. In such
case convert to a regular test + jump.
|
|
|
|
'john/compiler/fail-labels-in-blocks-otp-18/ERIERL-48/OTP-14522' into maint
* john/compiler/fail-labels-in-blocks-otp-18/ERIERL-48/OTP-14522:
compiler: Fix live regs update on allocate in validator
Take fail labels into account when determining liveness in block ops
Conflicts:
lib/compiler/src/beam_utils.erl
|
|
The state without pruned registers was passed on to test_heap
causing the validator to belive registers that aren't live
actually are live.
|
|
bjorng/bjorn/compiler/improve-case-opt/ERL-452/OTP-14525
Generalize optimization of "one-armed" cases
|
|
Even though, it's not possible to have fall-throughs when entering the otp
pass, it can produce them itself and we're running the pass until fixpoint.
|
|
This makes other optimisations more efficient since we have less labels overall.
|
|
It can happen we have the following situation:
{test,is_tuple,Fail,[R1]}
{test,test_arity,Fail,[R1,N1]}
{get_tuple_element,R1,N2,R2}
{test,is_eq_exaqct,Fail,[R2,Atom]}
{jump,Fail}
Previously, the optimisation would eliminate the last is_eq_exact test, but
we can do more. If the register R2 is not used in Fail, we can eliminate the
get_tuple_element instruction as well as all the preceding tests. Ultimately,
the whole sequence can be replaced by:
{jump,Fail}
|
|
This is especially useful after inlining a function with a case.
Today the compiler would most probably be able to unify all the leafs of the
case during the sharing optimisation, but it would fail to unify the pattern
matching itself.
Naively running the optimisation multiple times wouldn't be able to find the
common code either, because it would differ in jump/fail targets of various
instructions.
To remedy this, after doing each sharing pass we traverse the code backwards
when reversing and update all the jump targets with the new targets that were
discovered during the unification pass. This allows running the optimisation
until fixpoint and makes sure all sharing opportunities will be discovered.
This optimisation also helps with the Elixir's `with/else` construct.
|
|
|
|
* maint:
sys_core_fold: Fix unsafe optimization of non-variable apply
Correct type specification in ssl:prf/5
|