Age | Commit message (Collapse) | Author |
|
Most of the optimizations in beam_dead have been superseded
by the optimizations in beam_ssa_dead.
The forward/1 pass of beam_dead has been moved to beam_jump.
The beam_split pass splits blocks that contain instructions with
non-zero labels. Because there are no optimizations left that optimize
instructions within blocks, beam_block never needs to put such
instructions into blocks in the first place. beam_split also moved
'move' instructions out block to help beam_dead. That is no longer
necessary since beam_dead no longer exists.
|
|
Those optimizations are unsafe if beam_dead has been run before.
|
|
The 'move' instruction can be eliminated in code such as:
{test,is_eq_exact,{f,42},[{x,0},{atom,value}]}.
{move,{atom,value},{x,0}}.
Move that optimization from beam_dead to beam_a. The optimization
will be simpler because the 'move' instruction has not yet
been moved into a block. Getting rid of 'move' earlier will
also save work for later passes.
Also move the optimization that eliminates instructions such
as from beam_dead to beam_a:
{test_is_eq_exact,{f,42},[{x,0},{x,0}]}.
|
|
v3_codegen is replaced by three new passes:
* beam_kernel_to_ssa which translates the Kernel Erlang format
to a new SSA-based intermediate format.
* beam_ssa_pre_codegen which prepares the SSA-based format
for code generation, including register allocation. Registers
are allocated using the linear scan algorithm.
* beam_ssa_codegen which generates BEAM assembly code from the
SSA-based format.
It easier and more effective to optimize the SSA-based format before X
and Y registers have been assigned. The current optimization passes
constantly have to make sure no "holes" in the X register assignments
are created (that is, that no X register becomes undefined that an
allocation instruction depends on).
This commit also introduces the following optimizations:
* Replacing of tuple matching of records with the is_tagged_tuple
instruction. (Replacing beam_record.)
* Sinking of get_tuple_element instructions to just before the first
use of the extracted values. As well as potentially avoiding
extracting tuple elements when they are not actually used on all
executions paths, this optimization could also reduce the number
values that will need to be stored in Y registers. (Similar to
beam_reorder, but more effective.)
* Live optimizations, removing the definition of a variable that is
not subsequently used (provided that the operation has no side
effects), as well strength reduction of binary matching by replacing
the extraction of value from a binary with a skip instruction. (Used
to be done by beam_block, beam_utils, and v3_codegen.)
* Removal of redundant bs_restore2 instructions. (Formerly done
by beam_bs.)
* Type-based optimizations across branches. More effective than
the old beam_type pass that only did type-based optimizations in
basic blocks.
* Optimization of floating point instructions. (Formerly done
by beam_type.)
* Optimization of receive statements to introduce recv_mark and
recv_set instructions. More effective with far fewer restrictions
on what instructions are allowed between creating the reference
and entering the receive statement.
* Common subexpression elimination. (Formerly done by beam_block.)
|
|
This optimization can be better done in the SSA format before
code generation.
|
|
|
|
|
|
This complements the `map_get/2` guard BIF introduced in #1784.
Rationale.
`map_get/2` allows accessing map fields in guards, but it might be
problematic in more complex guard expressions, for example:
foo(X) when map_get(a, X) =:= 1 or is_list(X) -> ...
The `is_list/1` part of the guard could never succeed since the
`map_get/2` guard would fail the whole guard expression. In this
situation, this could be solved by using `;` instead of `or` to separate
the guards, but it is not possible in every case.
To solve this situation, this PR proposes a `is_map_key/2` guard that
allows to check if a map has key inside a guard before trying to access
that key. When combined with `is_map/1` this allows to construct a
purely boolean guard expression testing a value of a key in a map.
Implementation.
Given the use case motivating the introduction of this function, the PR
contains compiler optimisations that produce optimial code for the
following guard expression:
foo(X) when is_map(X) and is_map_key(a, X) and map_get(a, X) =:= 1 -> ok;
foo(_) -> error.
Given all three tests share the failure label, the `is_map_key/2` and
`is_map/2` tests are optimised away.
As with `map_get/2` the `is_map_key/2` BIF is allowed in match specs.
|
|
Extend an existing optimization in beam_dead to avoid
creating a match context when matching an empty binary.
|
|
|
|
When converting a comparison BIF (such as '=:=') to a test
instruction, run the other optimizations on the result.
When trying to combine is_eq_exact tests, handle the case
that is_eq_exact is followed by a jump instead of a label
to handle a test that has been newly converted from a BIF.
Taken together, those changes will coalesce more is_eq_exact
instructions into select_val instructions.
|
|
|
|
In practice, this optimization will only apply to contrived guards
that are almost never used in real applications. The only reason we
add this optimization is to help approach the goal of zero tolerance
for 'bif' instructions instead of 'test' instructions in guards.
|
|
A 'bif' or 'gc_bif' instruction is redundant if it has the same
failure label as a 'jump' instruction immediately following it.
There is no need to test for liveness of the destination register,
because the code at the failure label cannot safely assume that
the destination register is initialized. See the comments in the
code for further details.
In practice, this optimization will only apply to contrived guards
that are almost never used in real applications. The only reason we
add this optimization is to help approach the goal of zero tolerance
for 'bif' instructions instead of 'test' instructions in guards.
|
|
105c5b0071 was reverted in dd1162846e because clauses that were
supposed to match would not match. (See 8b83bc0b.)
Reintroduce the optimization, but make sure that we only shortcut
bs_context_to_binary instructions and not bs_start_match2 instructions.
|
|
This reverts commit 105c5b0071056dc062797e58772e098d2a3a4627.
|
|
If the Core Erlang optimization were turned off (using no_copt),
the optimization passes for Beam assembly could generate unsafe
code that did not initialize all Y registers before (for example)
a call instruction.
To fix this, beam_dead should not attempt to remove stores to Y
registers. That is not safe if there is an exception-generating
instruction inside a try...catch block.
|
|
|
|
When the bit syntax is used to match a single binary literal, the bit
syntax instructions will be replaced with a comparison to a binary
literal. The only problem is that the bs_context_to_binary instruction
will not be eliminated.
Example:
f(<<"string">>) ->
ok.
This function would be translated to:
{function, f, 1, 2}.
{label,1}.
{line,...}.
{func_info,...}.
{label,2}.
{test,is_eq_exact,{f,3},[{x,0},{literal,<<"string">>}]}.
{move,{atom,ok},{x,0}}.
return.
{label,3}.
{bs_context_to_binary,{x,0}}.
{jump,{f,1}}.
The bs_context_to_binary instruction serves no useful purpose,
since {x,0} can never be a match context. Eliminating the
instruction, the resulting code will be:
{function, f, 1, 2}.
{label,1}.
{line,...}.
{func_info,...}.
{label,2}.
{test,is_eq_exact,{f,1},[{x,0},{literal,<<"string">>}]}.
{move,{atom,ok},{x,0}}.
return.
|
|
In a select_val instruction, values associated with a label
which is the same as the failure label can be removed. We
already do this optimization in beam_clean, but it is better
do this sort of optimization before the beam_jump pass.
Also rewrite a select_val instruction with a single value
to is_eq_exact instruction followed by a jump instruction.
|
|
We can rewrite more instances of select_val to is_boolean because
it is not necessary that a particular label follows the select_val.
|
|
|
|
|
|
When matching a binary literal as in:
<<"abc">> = Bin
the compiler will produce a sequence of three instructions
(some details in the instructions removed for simplicity):
bs_start_match2 Fail BinReg CtxtReg
bs_match_string Fail CtxtReg "abc"
bs_test_tail2 Fail CtxtReg 0
The sequence can be replaced with:
is_eq_exact Fail BinReg "abc"
|
|
The actual bs_match_string instruction has four operands:
bs_match_string {f,Lbl} Ctxt NumBits {string,ListOfBytes}
However, v3_codegen emits a more compact representation where
the bits to match are packaged in a bitstring:
bs_match_string {f,Lbl} Ctxt Bitstring
Currently, beam_clean:clean_labels/1 will rewrite the compact
representation to the final representation. That is unfortunate
since clean_labels/1 is called by beam_dead, which means that
the less compact representation will be introduced long before
it is actually needed by beam_asm. It will also complicate any
optimizations that we might want to do.
Move the rewriting of bs_match_string from beam_clean:clean_labels/1
to the beam_z pass, which is the last pass executed before
beam_validator and beam_asm.
|
|
|
|
|
|
Optimize away 'not' in sys_core_fold instead of in beam_block
and beam_dead, as we can do a better job in sys_core_fold.
I modified the test suite temporarily to never turn off Core Erlang
modifications and looked at the coverage. With the new optimizations
active in sys_core_fold, the code in beam_block and beam_dead did not
find a single 'not' that it could optimize. That proves that the new
optimization is at least as good as the old one. Manually, I could
also verify that the new optimization would optimize some variations
of 'not' that the old one would not handle.
|
|
|
|
|
|
While we are, clean up the comments and rearrange the code for
clarity. Also add a test to cover the last uncovered line in
beam_dead.erl.
|
|
Better optimizations with less code.
|
|
The BEAM compiler translates code such as:
is_hex_digit(D) when $0 =< D, D =< $9 -> true;
is_hex_digit(D) when $a =< D, D =< $z -> true;
is_hex_digit(D) when $A =< D, D =< $Z -> true;
is_hex_digit(_) -> false.
to something like this:
L0: test is_ge L1 {x,0} 48
test is_ge L1 57 {x,0}
move true {x,0}
return.
L1: test is_ge L2 {x,0} 97
test is_ge L2 122 {x,0}
move true {x,0}
return
L2: test is_ge L3 {x,0} 65
test is_ge L3 90 {x,0}
move true {x,0}
return
L3: move false {x,0}
return
We can see that tests will be repeated even if they cannot possibly
succeed. For instance, if we pass in {x,0} equal to 32, the first
test that {x,0} is greater than or equal to 48 at L0 will fail.
The control will transfer to L1, where it will be tested whether
{x,0} is greater than 97. That test will fail and control
will pass to L2, where again the test will fail.
The compiler can do better by short-circuiting repeating tests:
L0: test is_ge L3 {x,0} 48
test is_ge L1 57 {x,0}
move true {x,0}
return.
L1: test is_ge L2 {x,0} 97
test is_ge L3 122 {x,0}
move true {x,0}
return
L2: test is_ge L3 {x,0} 65
test is_ge L3 90 {x,0}
move true {x,0}
return
L3: move false {x,0}
return
|
|
|
|
Eliminate some code bloat.
|
|
The bs_match_string instruction is used to speed up matching of
binary literals. For example, given this source code:
foo1(<<1,2,3>>) -> ok.
The matching part of the code will look like:
{test,bs_start_match2,{f,1},1,[{x,0},0],{x,0}}.
{test,bs_match_string,{f,3},[{x,0},24,{string,[1,2,3]}]}.
{test,bs_test_tail2,{f,3},[{x,0},0]}.
Nice. However, if we do a simple change to the source code:
foo2(<<1,2,3>>) -> ok;
foo2(<<>>) -> error.
the resulting matching code will look like (sligthly simplified):
{test,bs_start_match2,{f,4},1,[{x,0},0],{x,0}}.
{test,bs_get_integer2,{f,7},1,[{x,0},{integer,8},1,Flags],{x,1}}.
{test,is_eq_exact,{f,8},[{x,1},{integer,1}]}.
{test,bs_match_string,{f,6},[{x,0},16,{string,[2,3]}]}.
{test,bs_test_tail2,{f,6},[{x,0},0]}.
{move,{atom,ok},{x,0}}.
return.
{label,6}.
{bs_restore2,{x,0},{atom,start}}.
{label,7}.
{test,bs_test_tail2,{f,8},[{x,0},0]}.
That is, matching of the first byte is not combined into the
bs_match_string instruction that follows.
Fix this problem by allowing a bs_match_string instruction to be
used if all clauses will match either the same integer literal or
the empty binary.
|
|
To facilitate debugging of compiler bugs, teach the compiler the
'no_dead' option. Since the beam_dead pass used to do the necessary
splitting of basic blocks to expose all labels, we must move that
splitting into a separate pass that is always run.
|
|
|
|
|
|
There is never any empty blocks when beam_dead is invoked.
Even if there were, they will be removed a little bit later in
forward/4.
|
|
In the optimization of binary matching, it seems that two clauses
cannot never be reached. Removing the clauses is safe, since that
would only mean that an opportunity for an optimization is lost
|
|
Because the code generator (v3_codegen) would not include the
same value more than once in a select_val/3 instruction and
because a label can only be referenced by one select_val/3
instruction, there is no way that the correct value could already
be in the gb_tree. (Even if it could happen, this change is
safe because only opportunity for an optimization would be missed;
incorrect code would not be generated.)
|
|
Since the optimizations in forward/4 already depends on some
assumptions on how code is generated anyway, document the
assumptions in a comment and remove the uncoverable code.
|
|
* bg/compiler:
beam_peep: Remove optimization already done by beam_dead
beam_dead: Combine is_eq_exact instructions into select_val instructions
Evaluate is_record/3 at compile-time using type information
Evaluate element/2 at compile-time using type information
erl_expand_records: Replace is_record() with matching
OTP-8668 bg/compiler
The compiler optimizes record operations better.
|
|
Combine a sequence of chained is_eq_exact instructions into
a select_val instruction.
|
|
|