Age | Commit message (Collapse) | Author |
|
|
|
|
|
Most of the optimizations in beam_dead have been superseded
by the optimizations in beam_ssa_dead.
The forward/1 pass of beam_dead has been moved to beam_jump.
The beam_split pass splits blocks that contain instructions with
non-zero labels. Because there are no optimizations left that optimize
instructions within blocks, beam_block never needs to put such
instructions into blocks in the first place. beam_split also moved
'move' instructions out block to help beam_dead. That is no longer
necessary since beam_dead no longer exists.
|
|
This optimization working on the SSA format will replace
the similar optimization in beam_dead. See the comment
for an explanation of what the new optimization does.
|
|
v3_codegen is replaced by three new passes:
* beam_kernel_to_ssa which translates the Kernel Erlang format
to a new SSA-based intermediate format.
* beam_ssa_pre_codegen which prepares the SSA-based format
for code generation, including register allocation. Registers
are allocated using the linear scan algorithm.
* beam_ssa_codegen which generates BEAM assembly code from the
SSA-based format.
It easier and more effective to optimize the SSA-based format before X
and Y registers have been assigned. The current optimization passes
constantly have to make sure no "holes" in the X register assignments
are created (that is, that no X register becomes undefined that an
allocation instruction depends on).
This commit also introduces the following optimizations:
* Replacing of tuple matching of records with the is_tagged_tuple
instruction. (Replacing beam_record.)
* Sinking of get_tuple_element instructions to just before the first
use of the extracted values. As well as potentially avoiding
extracting tuple elements when they are not actually used on all
executions paths, this optimization could also reduce the number
values that will need to be stored in Y registers. (Similar to
beam_reorder, but more effective.)
* Live optimizations, removing the definition of a variable that is
not subsequently used (provided that the operation has no side
effects), as well strength reduction of binary matching by replacing
the extraction of value from a binary with a skip instruction. (Used
to be done by beam_block, beam_utils, and v3_codegen.)
* Removal of redundant bs_restore2 instructions. (Formerly done
by beam_bs.)
* Type-based optimizations across branches. More effective than
the old beam_type pass that only did type-based optimizations in
basic blocks.
* Optimization of floating point instructions. (Formerly done
by beam_type.)
* Optimization of receive statements to introduce recv_mark and
recv_set instructions. More effective with far fewer restrictions
on what instructions are allowed between creating the reference
and entering the receive statement.
* Common subexpression elimination. (Formerly done by beam_block.)
|
|
Call test_lib:recompile/1 from init_per_suite/1 instead of
from all/0. That makes it easy to find the log from the
compilation in the log file for the init_per_suite/1 test
case.
|
|
|
|
f9a323d10a9f5d added consistent operand order for equality
comparisons. As a result, beam_dead:turn_op/1 is no longer covered.
We must keep the uncovered lines in beam_dead to ensure that
beam_dead can handle BEAM assembly code from another source than
v3_codegen that might not follow the operand order convention.
The only way to cover the lines is to use BEAM assembly in
the test case.
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
|
|
The guard optimizations in v3_kernel has removed the need for
beam_bool.
|
|
|
|
Funs must not be created in guards. The instruction for creating
a fun clobbers all X registers, which is a bad thing to do in
a guard.
|
|
Sometimes v3_codegen would generate unsafe code when there was
a call to error/1 in a guard.
|
|
It's time that we have a dedicated test suite for beam_bool.
|
|
|
|
|
|
|
|
Those clause are obsolete and never used by common_test.
|
|
|
|
?config is ugly and not recommended. Use proplists:get_value/2
instead.
|
|
As a first step to removing the test_server application as
as its own separate application, change the inclusion of
test_server.hrl to an inclusion of ct.hrl and remove the
inclusion of test_server_line.hrl.
|
|
The following code would crash v3_codegen:
order(From) ->
catch
if
From#{[] => sufficient} ->
saint
end.
Before explaining the crash, first some background on the stack
frame and the Y registers.
Certain instructions, most notably the 'call' instructions, clobber
all X registers. Before any such instruction, all X registers that
have values that will be used after the call must be saved to Y
registers (i.e. to the stack frame). adjust_stack/4 will be called
when X registers must be saved.
There is also another situation when X registers must be saved, namely
within a 'catch' if we are about to execute any instruction that may
cause an exception. Examples of such instructions are some guard BIFs
(such as length/1) and construction of binaries or maps. Within a
'catch', X registers must be be saved because if an exception is
thrown and catched all X registers will be destroyed. The same
adjust_stack/4 function will be called for those instructions, but
only if they occur within a 'catch'.
There is actually one more complication. If there is code in
a guard within a catch, the X registers should not be saved, because
the code in a guard never clobbers any X registers that were alive
before the guard code was entered. v3_codegen is written with the
implicit assumption that code in guards never cause anything
to be saved to Y registers.
The code for building maps and binaries would incorrectly save X
registers within a guard inside a 'catch'.
For construction of binaries, that would mean that a useless but
harmelss 'move' instruction was generated.
But for construction of maps, the saving of the Y register would not
be harmless. There would be a crash when attempting to merge #sr{}
records. #sr{} records keeps track of the contents of X and Y
registers. When two separate code paths are joined (e.g. at the end of
'case' statement), the register descriptors must be reconciled.
Basically, the register descriptors for both paths must be identical.
The #sr{} record for one path must not claim that {y,0} contains
a certain value, while another path claims that {y,0} is dead.
Thus, the crash occurs in sr_merge/2 when failing to reconcile the
Y registers.
To fix this bug this bug we will introduce a new function called
maybe_adjust_stack/5. It will save X registers on the stack only
if the code is inside a catch but not inside a guard. We will
change all existing code to use this new function when appropriate.
Reported-by: Thomas Arts
|
|
beam_bool would make the following code unsafe (which would be
reported by beam_validator):
scotland(Echo) ->
found(case Echo of
Echo when true; Echo, Echo, Echo ->
Echo;
echo ->
[]
end,
Echo = placed).
found(_, _) -> million.
Basically, beam_bool would see that the 'case' would always return
the value of Echo. Thus:
scotland(Echo) ->
found(Echo, Echo = placed).
The only problem is that beam_bool would also remove a 'move'
instruction that would save Echo to the stack. Here is the
assembly code for part of the function:
{allocate_zero,1,1}.
{move,{x,0},{y,0}}. %% Save Echo on stack.
{bif,'=:=',{f,7},[{x,0},{atom,true}],{x,1}}.
{bif,'=:=',{f,7},[{x,0},{atom,true}],{x,2}}.
{bif,'=:=',{f,7},[{x,0},{atom,true}],{x,3}}.
{bif,'and',{f,7},[{x,2},{x,3}],{x,2}}.
{bif,'and',{f,7},[{x,1},{x,2}],{x,1}}.
{jump,{f,8}}.
{label,7}.
{move,{atom,false},{x,1}}.
{label,8}.
{bif,'or',{f,6},[{atom,true},{x,1}],{x,1}}.
{test,is_eq_exact,{f,6},[{x,1},{atom,true}]}. %% Jump never taken.
{jump,{f,5}}.
{label,6}.
{test,is_eq_exact,{f,9},[{x,0},{atom,echo}]}.
{move,nil,{x,0}}.
{jump,{f,5}}.
{label,9}.
{test_heap,3,0}.
{put_tuple,2,{x,0}}.
{put,{atom,case_clause}}.
{put,{y,0}}.
{line,[{location,"t.erl",5}]}.
{call_ext,1,{extfunc,erlang,error,1}}.
{jump,{f,5}}.
{label,5}.
{test,is_eq_exact,{f,12},[{atom,placed},{y,0}]}.
beam_bool would see that the is_eq_exact test at label 8 would
always succeed. It could therefore remove most of the code before
the jump to label 5. Unfortunately it also removed the essential
move of Echo to the stack:
{allocate_zero,1,1}.
%% Instruction incorrectly removed: {move,{x,0},{y,0}}.
{jump,{f,5}}.
{label,5}.
{test,is_eq_exact,{f,12},[{atom,placed},{y,0}]}.
The root cause of the problem is that the 'move' instruction is
included in the block of 'bif' instructions before label 8.
Normally the 'move' instruction would not have been discarded,
but because the left operand to the 'or' BIF is 'true', the
entire block with 'bif' instructions are dropped.
As far as I can see, there is no gain by including 'move'
instructions in the first place. There is no way that better
code will be produced. In fact, the entire optimization can
be given up if 'move' instructions are found in the block.
Thus we can fix this bug by never including any 'move' instructions
in the block of 'bif' instructions. We can also remove all the
code that deals with 'move' instructions within blocks.
Reported-by: Thomas Arts
|
|
|
|
|
|
The .core or .S files that are compiled in the test cases
may lack module_info/0,1 functions, which will cause problems if
we (for example) try to run eprof later. To avoid that problem,
unload each module directly after testing it.
|
|
When calculating the number of live registers for allocation
instruction, it is not always safe to start with the number of live
registers at the start of the block. We will need to use the register
map to know whether there are any holes (dead registers) that are not
subsequently filled.
If the allocation instruction already has a number of live registers
calculated, there is nothing to be gained by raising it.
|
|
I have spent too much time lately waiting for 'cover' to finish,
so now its time to optimize the running time of the tests suite
in coverage mode.
Basically, when 'cover' is running, the test suites would not
run any tests in parallel. The reason is that using too many
parallel processes when running 'cover' would be slower than
running them sequentially. But those measurements were made
several years ago, and many improvements have been made to
improve the parallelism of the run-time system.
Experimenting with the test_lib:p_run/2 function, I found that
increasing the number of parallel processes would speed up the
self_compile tests cases in compilation_SUITE. The difference
between using 3 processes or 4 processes was slight, though,
so it seems that we should not use more than 4 processes when
running 'cover'.
We don't want to change test_lib:parallel/0, because there is
no way to limit the number of test cases that will be run in
parallel by common_test. However, there as test suites (such as
andor_SUITE) that don't invoke the compiler at run-time. We can
run the cases in such test suites in parallel even if 'cover'
is running.
|
|
* egil/fix-maps-compiler-coverage/OTP-12425:
compiler: Rename util function to adhere to name policy
compiler: Remove get_map_elements label check in blocks
compiler: Remove unnecassary guard for get_map_elements
compiler: Remove dead code in beam_flatten
compiler: Increase Maps code coverage
|
|
* maint:
Update primary bootstrap
beam_bool: Correct live calculation for GC BIFs
beam_bool: Correct indentation for try...catch
sys_core_fold: Correct optimization of 'case'
Conflicts:
bootstrap/bin/start.boot
bootstrap/bin/start_clean.boot
bootstrap/lib/compiler/ebin/beam_asm.beam
bootstrap/lib/stdlib/ebin/io_lib_pretty.beam
|
|
When optimizing boolean expressions, it is not always possible
to find a number of live registers for a GC BIF that both preserves
all source registers that will be tested and at the same time
does not include registers that are not initialized.
As currently implemented, we have incomplete information about
the register calculated from the free variables. Some registers
are marked as "reserved". Reserved registers means that we don't
know anything about them; they may or may not be initialized.
As a conservative correction (suitable for a maintenance release), we
will abort the optimization if we find any reserved registers when
calculating the number of live registers. We will not attempt to
improve the information about the registers in this commit.
By examining the coverage when running the existing compiler test
suite we find that the optimization is aborted 15 times (before
adding any new test cases). To put that in perspective, the
optimization is successfully applied 4927 times, and aborted for
other reasons 547 times.
Reported-by: Ulf Norell
Reported-by: Anthony Ramine
|
|
This commit covers 'split_block_label_used' in the beam_bool pass for Maps.
|
|
The BEAM compiler translates code such as:
is_hex_digit(D) when $0 =< D, D =< $9 -> true;
is_hex_digit(D) when $a =< D, D =< $z -> true;
is_hex_digit(D) when $A =< D, D =< $Z -> true;
is_hex_digit(_) -> false.
to something like this:
L0: test is_ge L1 {x,0} 48
test is_ge L1 57 {x,0}
move true {x,0}
return.
L1: test is_ge L2 {x,0} 97
test is_ge L2 122 {x,0}
move true {x,0}
return
L2: test is_ge L3 {x,0} 65
test is_ge L3 90 {x,0}
move true {x,0}
return
L3: move false {x,0}
return
We can see that tests will be repeated even if they cannot possibly
succeed. For instance, if we pass in {x,0} equal to 32, the first
test that {x,0} is greater than or equal to 48 at L0 will fail.
The control will transfer to L1, where it will be tested whether
{x,0} is greater than 97. That test will fail and control
will pass to L2, where again the test will fail.
The compiler can do better by short-circuiting repeating tests:
L0: test is_ge L3 {x,0} 48
test is_ge L1 57 {x,0}
move true {x,0}
return.
L1: test is_ge L2 {x,0} 97
test is_ge L3 122 {x,0}
move true {x,0}
return
L2: test is_ge L3 {x,0} 65
test is_ge L3 90 {x,0}
move true {x,0}
return
L3: move false {x,0}
return
|
|
* nox/compiler/beam_bool-bad-protected:
Properly detect nonboolean protected expressions in beam_bool
|
|
Silly code such as the following could make the compiler crash:
f() when erlang:float(self()); true -> ok.
Reported-by: Ulf Norell
|
|
Calls to erlang:is_function/2 where the second is not a literal nonnegative
integer can crash at runtime and thus can't be marked as safe.
|
|
|
|
|
|
Run testcases in parallel will make the test suite run slightly
faster. Another reason for this change is that we want more testing
of parallel testcase support in common_test.
|
|
|
|
In 3d0f4a3085f11389e5b22d10f96f0cbf08c9337f (an update to conform
with common_test), in all test_lib:recompile(?MODULE) calls, ?MODULE
was changed to the actual name of the module. That would cause
test_lib:recompile/1 to compile the module with the incorrect
compiler options in cloned modules such as record_no_opt_SUITE,
causing worse coverage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Added only a few testcases in compiler:error_SUITE and guard_SUITE
The new behaviour of warnings and errors when overriding autoimported BIF's:
Bifs that were autoimported before R14 are dangerous because old code
using them and overriding them in exports can start behaving
differently. For newly added autoimports this can't happen to the new
code that wants to (or dont want to) use them, why only warnings are
added for the BIFs autoimported after the compilator change. Errors
are issued only for code that could have worked in one way in R13 and
now will behave in a different way.
If overriding autoimport with local function:
- if explicit -compile directive supresses autoimport
-> no message
else
- if called from inside module
- if pre R14 autoimported bif
-> error
else
-> warning
else
-> no message
If overriding autoimport with import directive
- if explicit -compile directive supresses autoimport
-> no message
else (regardless of actual usage)
- if pre R14 autoimported bif
-> error
else
-> warning
Calls of local functions or imports overriding autoimported functions
(either post R14 or by using explicit -compile supressions of
autoimport) always goes to the local function or the imported.
The compileation errors are added to not let code like this silently
and disastrously change its semantic (probably to an infinite loop)
between R13 and R14:
----------
-module(m).
-export([length/1]).
length(X) ->
...
Y = length(Z),
....
----------
The user has to select if he/she wants to call length in 'erlang' explicitly
or if the overriding semantics is desired, in which case the -compile
directive has to be used.
-compile({no_auto_import,[F/A]}). Is added to allow to override the
autoimports so that code gets unanbiguous. The directive will remove
an autoimport even if there is no local function or import overriding,
because any other behaviour would be inconsistent and confusing.
record_info and module_info can never be overridden.
|
|
Add the gc_bif's to the VM.
Add infrastructure for gc_bif's (guard bifs that can gc) with two and.
three arguments in VM (loader and VM).
Add compiler support for gc_bif with three arguments.
Add compiler (and interpreter) support for new guard BIFs.
Add testcases for new guard BIFs in compiler and emulator.
|