Age | Commit message (Collapse) | Author |
|
beam_block has an optimization that only is safe when it is applied
immediately after code generation. That is pointed out in a comment:
NOTE: Moving allocation instructions is only safe because it is done
immediately after code generation so that we KNOW that if {x,X} is
initialized, all x registers with lower numbers are also initialized.
That assumption may not be true after other optimizations, such as
the beam_utils:live_opt/1 optimization.
The new beam_reorder pass added in OTP 19 runs before beam_block.
Therefore, the optimization is potentially unsafe. The optimization
is also unsafe if compilation is started from assembly code in a
.S file.
Rewrite the optimization to make it safe. See the newly added comment
for details.
ERL-202
|
|
Somewhat simplified, beam_block would rewrite the target for
the first instruction in this code sequence:
move x(0) => y(1)
gc_bif '+' 1 x(0) => y(0)
move y(1) => x(1)
move nil => x(0)
call 2 local_function/2
The resulting code would be:
move x(0) => x(1) %% Changed target.
gc_bif '+' 1 x(0) => y(0)
move x(1) => y(1) %% Operands swapped (see 02d6135813).
move nil => x(0)
call 2 local_function/2
The resulting code is not safe because the x(1) will be killed
by the gc_bif instruction.
7a47b20c3a cleaned up move optimizations and would reject the
optimization if the target was an X register and an allocating
instruction was found. To avoid this bug, the optimization must be
rejected even if the target is a Y register.
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
Remove the unreachable instructions after a 'raise' instruction
(e.g. a 'jump' or 'deallocate', 'return') to decrease code size.
|
|
|
|
Consider this code:
%% Start of block
get_tuple_element Tuple 0 Element
get_map_elements Fail Map [Key => Dest]
.
.
.
move Element UltimateDest
%% End of block
Fail:
%% Code that uses Element.
beam_block (more precisely, otp_tuple_element/1) would
incorrectly transform the code to this:
%% Start of block
get_map_elements Fail Map [Key => Dest]
.
.
.
get_tuple_element Tuple 0 UltimateDest
%% End of block
Fail:
%% Code that uses Element.
That is, the code at label Fail would use register Element,
which is either uninitalized or contains the wrong value.
We could fix this problem by always keeping label information
at hand when optimizing blocks so that we could check the code
at the failure label for get_map_elements. That would require
changes to beam_block and beam_utils. We might consider doing
that in the future if it turns out be worth it.
For now, I have decided that I want to keep the simplicity of blocks
(allowing them to be optimized without keeping label information).
That could be achieved by not including get_map_elements in
blocks. Another way, which I have chosen, is to only allow
get_map_elements as the first instruction in the block.
For background on the bug: c288ab8 introduced the beam_reorder pass
and 5f431276 introduced opt_tuple_element() in beam_block.
|
|
There is an optimization in beam_block to simplify a select_val
on a known boolean value. We can implement this optimization
in a cleaner way in beam_type and it will also be applicable
in more situations. (When I added the optimization to beam_type
without removing the optimization from beam_block, the optimization
was applied 66 times.)
|
|
In the future we might want to add more bit syntax optimizations,
but beam_block is already sufficiently complicated. Therefore, move
the bit syntax optimizations out of beam_block into a separate
compiler pass called beam_bs.
|
|
d0784035ab fixed a problem with register corruption. Because of
that, opt_moves/2 will never be asked to optimize instructions with
more than two destination registers. Therefore, to regain full
coverage of beam_block, remove the final clause in opt_moves/2.
|
|
|
|
Instruction get_map_elements might destroy target registers when the fail-label is taken.
Only seen for patterns with two, and only two, target registers.
Specifically: we copy one register, and then jump.
foo(A,#{a := V1, b := V2}) -> ...
foo(A,#{b := V}) -> ...
call foo(value, #{a=>whops, c=>42}).
corresponding assembler:
{test,is_map,{f,5},[{x,1}]}.
{get_map_elements,{f,7},{x,1},{list,[{atom,a},{x,1},{atom,b},{x,2}]}}.
%% if 'a' exists but not 'b' {x,1} is overwritten, jump {f,7}
{move,{integer,1},{x,0}}.
{call_only,3,{f,10}}.
{label,7}.
{get_map_elements,{f,8},{x,1},{list,[{atom,b},{x,2}]}}.
%% {x,1} (src) is read with a corrupt value
{move,{x,0},{x,1}}.
{move,{integer,2},{x,0}}.
{call_only,3,{f,10}}.
The fix is to remove 'opt_moves' pass for get_map_elements instruction
in the case of two or more destinations.
Reported-by: Valery Tikhonov
|
|
In 45f469ca0890, the BEAM loader started to use x(1023) as scratch
register for some instructions. Therefore we should not allow
x(1023) to be used in code emitted by the compiler.
|
|
Put 'try' instructions inside block to improve the optimization
of allocation instructions. Currently, the compiler only looks
at initialization of y registers inside blocks when determining
which y registers that will be "naturally" initialized.
|
|
|
|
Here is an example of a move instruction that could not be optimized
away because the {x,2} register was not killed:
get_tuple_element Reg Pos {x,2}
.
.
.
move {x,2} {y,0}
put_list {x,2} nil Any
We can do the optimization if we replace all occurrences of the {x,2}
register as a source with {y,0}:
get_tuple_element Reg Pos {y,0}
.
.
.
put_list {y,0} nil Dst
|
|
The 'move' optimization was relatively clean until GC BIFs
were introduced. Instead of re-thinking the implementation,
the existing code was fixed and patched.
The current code unsuccessfully attempts to eliminate 'move'
instructions across GC BIF and allocation instructions. We can
simplify the code if we give up as soon as we encounter any
instruction that allocates.
|
|
opt_alloc/1 makes a redundant call to opt/1. It is redundant because
the opt/1 function has already been applied to the instruction
sequence prior to calling opt_alloc/1.
|
|
|
|
Commit b76588fb5a introduced an optimization of the compile time of
huge functions with many bs_match_string instructions. The
optimization is done in two passes. The first pass coalesces adjacent
bs_match_string instructions. To avoid copying bitstrings multiple
times, the bitstrings in the instructions are combined in to a (deep)
list. The second pass goes through all instructions in the function
and combines the list of bitstrings to a single bitstring in all
bs_match_string instructions.
The second pass (fix_bs_match_string) is run on all instructions in
each function, even if there are no bs_match_instructions in the
function. While fix_bs_match_string is not a bottleneck (it is a
linear pass), its execution time is noticeable when profiling some
modules.
Move the execution of the second pass to the select_binary()
function so that it will only be executed for instructions that
do binary matching. Also take the opportunity to optimize away
uses of bs_restore2 that occour directly after a bs_save2. That
optimimization is currently done in beam_block, but it can be
done essentially for free in the same pass that fixes up
bs_match_string instructions.
|
|
Optimize away 'not' in sys_core_fold instead of in beam_block
and beam_dead, as we can do a better job in sys_core_fold.
I modified the test suite temporarily to never turn off Core Erlang
modifications and looked at the coverage. With the new optimizations
active in sys_core_fold, the code in beam_block and beam_dead did not
find a single 'not' that it could optimize. That proves that the new
optimization is at least as good as the old one. Manually, I could
also verify that the new optimization would optimize some variations
of 'not' that the old one would not handle.
|
|
As a preparation for fixing a bug, introduce a complete register
map in the '%live' annotations.
|
|
* beam_utils:joineven/1 -> beam_utils:join_even/1
* beam_utils:split_even/1 -> beam_utils:split_even/1
|
|
|
|
* Combine multiple get values with one instruction
* Combine multiple check keys with one instruction
|
|
The instruction get_map_element has a faillabel so you may not
use the instruction within a allocate/deallocate block.
|
|
|
|
To make it possible to build the entire OTP system, also define
dummys for the instructions in ops.tab.
|
|
Any init instruction following an allocate is put in the Inits list of the
corresponding alloc tuple.
|
|
If an allocate_zero instruction is fed to beam_block and the beam_type
pass is not used afterwards (e.g. with erlc +no_topt), the 'no_opt' atom
will be rejected by beam_flatten.
|
|
The compiler shouldn't crash when fed an already-optimised BEAM assembly file.
|
|
|
|
Eliminate some code bloat.
|
|
Seven bs_put_* instructions can be combined into one generic bs_put
instruction to avoid some code bloat. That will also improve some
optimizations (such as beam_trim) that did not handle all bs_put*
variants.
|
|
Since we always want to remove unused labels directly after code
generation (whether we'll run the optimization passes or not),
we can simplify the code by doing it in beam_a.
|
|
In modules with huge functions with many bs_match_string
instructions, we can speed up the compilation by combining
adjacent bs_match_strings instruction in v3_codegen (as opposed
to in beam_block where we used to do it).
For instance, on my computer the v3_codegen became more than
twice as fast when compiling the re_testoutput1_split_test module
in the STDLIB test suites.
|
|
|
|
|
|
Moving of allocation instructions upwards in the instruction
stream (in order to enable further optimizations) in beam_block,
is implemented with the assumption that if a register {x,X}
contains a valid term, then all other x register with lower
numbers than X also contain valid terms. That assumption is
true after code generation.
The beam_utils:live_opt/1 optimization, however, may invalidate
that assumption. For instance, if a receive statement exports a
variable that is used, but the return value of the receive statement
is not used, then {x,1} but not {x,0} contains a valid term at the
end of the receive statement. If the receive statement is
followed by
{bif,self,{f,0},[],{x,0}}.
{test_heap,NumberOfWords,2}.
moving the allocation upwards will produce
{test_heap,NumberOfWords,2}.
{bif,self,{f,0},[],{x,0}}.
which will cause the beam_validator pass to scream loudly that
{x,0} is not live at the test_heap instruction.
Fix the problem by doing the optimizations in reverse order.
Reported-by: Jim Engquist
|
|
Since the introduction of literals in R12B, empty tuples
are literals. Thus the put_tuple/2 instruction is always
followed by at least one put/1 instruction. Therefore
the alloc_may_pass/1 function in beam_block no longer needs
a clause for for "put_tuple", because the clause for "put"
will always be reached first (since the instruction stream
is scanned in reverse execution order).
Note that if the compiler would generate a "put_tuple 0 Dst"
instruction for some unfathomable reason, we should still be
because the run-time system will now refuse to load a module
containing such an instruction.
|
|
* bg/compiler-remove-r11-support:
compiler: Don't support the no_binaries option
erts: Don't support the put_string/3 instruction
compiler: Don't support the no_constant_pool option
compiler: Don't support the r11 option
test_server: Don't support communication with R11 nodes
binary_SUITE: Don't test bit-level binary roundtrips with R11 nodes
erts: Test compatibility of funs with R12 instead of R11
OTP-8531 bg/compiler-remove-r11-support
|
|
The no_constant_pool option was implied by the r11 option. It turns
off the usage of the constant (literal) pool, so that BEAM
instructions that use constants can be loaded in an R11 system.
Since the r11 option has been removed, there is no need to
retain the no_constant_pool option.
|
|
|