Age | Commit message (Collapse) | Author |
|
Optimise beam_jump
|
|
|
|
This is an alternative to #1832.
The optimisation relies on special-casing the common pattern of
"renaming" a label by direct jump to another label. The change makes
beam_jump recognise couple more opportunities for optimisation.
The optimisation additionally avoids superfluous list concatenations by
only flattening the accumulator at the very end.
|
|
This is especially useful after inlining a function with a case.
Today the compiler would most probably be able to unify all the leafs of the
case during the sharing optimisation, but it would fail to unify the pattern
matching itself.
Naively running the optimisation multiple times wouldn't be able to find the
common code either, because it would differ in jump/fail targets of various
instructions.
To remedy this, after doing each sharing pass we traverse the code backwards
when reversing and update all the jump targets with the new targets that were
discovered during the unification pass. This allows running the optimisation
until fixpoint and makes sure all sharing opportunities will be discovered.
This optimisation also helps with the Elixir's `with/else` construct.
|
|
We have found cases where compilation drastically slows down
due to this commit. We are working on a minimal cases and plan
to bring this patch back once we can work our the performance
issues.
This reverts commit f7c9383f4c3d4b6819b5ba4d54c7093df806fe4a.
|
|
This clause seems to have been introduced in cac51274eb9a.
|
|
Even though, it's not possible to have fall-throughs when entering the otp
pass, it can produce them itself and we're running the pass until fixpoint.
|
|
This makes other optimisations more efficient since we have less labels overall.
|
|
It can happen we have the following situation:
{test,is_tuple,Fail,[R1]}
{test,test_arity,Fail,[R1,N1]}
{get_tuple_element,R1,N2,R2}
{test,is_eq_exaqct,Fail,[R2,Atom]}
{jump,Fail}
Previously, the optimisation would eliminate the last is_eq_exact test, but
we can do more. If the register R2 is not used in Fail, we can eliminate the
get_tuple_element instruction as well as all the preceding tests. Ultimately,
the whole sequence can be replaced by:
{jump,Fail}
|
|
This is especially useful after inlining a function with a case.
Today the compiler would most probably be able to unify all the leafs of the
case during the sharing optimisation, but it would fail to unify the pattern
matching itself.
Naively running the optimisation multiple times wouldn't be able to find the
common code either, because it would differ in jump/fail targets of various
instructions.
To remedy this, after doing each sharing pass we traverse the code backwards
when reversing and update all the jump targets with the new targets that were
discovered during the unification pass. This allows running the optimisation
until fixpoint and makes sure all sharing opportunities will be discovered.
This optimisation also helps with the Elixir's `with/else` construct.
|
|
|
|
The guard optimizations in v3_kernel has removed the need for
beam_bool.
|
|
Since the beam_a pass has always been run and have removed any
unused label, there can never be a label as the very last
instruction in a function.
|
|
eliminate_fallthroughs/2 has special code to handle two labels next to
each other, but that does not seem to ever happen and there was one
line uncovered in is_label/1. Since inserting an extra jump between
two labels would not cause any real problems, remove the extra
handling of two consecutive labels.
|
|
It is not safe to share code between 'catch' blocks.
|
|
In complicated code with many indirect jumps to the func_info label,
a label could get lost.
|
|
|
|
Put 'try' instructions inside block to improve the optimization
of allocation instructions. Currently, the compiler only looks
at initialization of y registers inside blocks when determining
which y registers that will be "naturally" initialized.
|
|
|
|
Before beam_split the get_map_elements instruction is still in
blocks and the helper function in beam_jump did not reflect this.
Reported-by: Quviq twitter account
|
|
|
|
|
|
The use of lists:dropwhile/2 is noticeable in the eprof results.
|
|
* bjorn/compiler/beam_jump-share:
beam_jump: Don't jump into the middle of a 'try'
|
|
José Valim noticed that code such as:
match(1) -> 1;
match(2) -> 2;
match(3) -> 3;
...
match(1000) -> 1000.
would compile very slowly. The culprit is opt/3 in beam_jump.
What happens is that opt/3 will rewrite this code:
select_val ...
label 1
jump 1000
label 2
jump 1000
...
label 999
jump 1000
label 1000
return
very slowly to this code:
select_val ...
label 1
label 2
...
label 999
label 1000
return
The reason for the slowness is that when opt/3 sees this
sequence:
label 1
jump 1000
...
it will remove the label (storing it in a dictionary),
and pick up the previously processed instruction from
the accumulator:
select_val ...
jump 1000
label 2
jump 1000
...
That is done in order to process all labels before the
jump and also to get rid of the jump instruction if the
previous instruction is an "unreachable after". In this
case, re-processing the sequence will remove the now
unreachable jump instruction:
select_val ...
label 2
jump 1000
...
The problem is that re-processing the select_val instruction is
expensive. The instruction has a list of 1000 labels, all of which
will be added (again) to the set of referenced labels. The
select_val instruction will be re-processed again and again
until all labels and jumps have been gobbled up.
In the original version of beam_jump, opt/3 was not called
repeatedly until a fixpoint was found, but was expected to do
all its optimizations in one pass. The fixpoint iteration was
added later.
Since we now have the fixpoint iteration, there is no need
to do everything in a single pass. When we encounter a jump, we will
collect all previously seen labels and put them into the dictionary,
and then we will move on.
As a further optimization, we will look for sequences like this:
jump X
label ...
jump X
and replace them with:
label ...
jump X
In the example above, that will avoid 1000 updates of the dictionary.
After applying this optimization, compilation of the
pattern went from roughly 55 s to 0.1 s for the example
above but with 10000 clauses.
Reported-by: José Valim
|
|
The code sharing optimization could produce a jump into the
middle of a 'try' block. beam_validator would reject the code.
Reported-by: Ulf Norell
|
|
The get_map_elements instruction has been removed from all blocks by the
mandatory beam_split pass and thus only needs handling by the outer loop.
|
|
* nox/maps-beam_jump-put_map:
Properly collect labels in put_map instructions in beam_jump
|
|
Reported-by: Ulf Norell
|
|
Reported-by: Ulf Norell
|
|
* Combine multiple get values with one instruction
* Combine multiple check keys with one instruction
|
|
|
|
To make it possible to build the entire OTP system, also define
dummys for the instructions in ops.tab.
|
|
This makes applying the pass a second time a no-op.
|
|
|
|
Generate slightly smaller and faster code.
|
|
Somewhat reduce the code bloat by eliminating special cases.
|
|
Eliminate some code bloat.
|
|
Rewrite the five binary creation instructions to a bs_init
instruction, in order to somewhat reduce code bloat.
|
|
We can remove some code bloat by handling the special instructions
as BIF instructions in the optimization passes. Also note that
bs_utf*_size was not handled by beam_utils:check_liveness/3
(meaning the conservative answer instead of the correct answer
would be returned).
|
|
Seven bs_put_* instructions can be combined into one generic bs_put
instruction to avoid some code bloat. That will also improve some
optimizations (such as beam_trim) that did not handle all bs_put*
variants.
|
|
Since we always want to remove unused labels directly after code
generation (whether we'll run the optimization passes or not),
we can simplify the code by doing it in beam_a.
|
|
beam_jump moves short code sequences ending in an instruction that causes
an exception to the end of the function, in the hope that a jump around
the moved blocked can be replaced with a fallthrough. Therefore, moving
a block that is entered via a fallthrough defeats the purpose of the
optimization.
Also add two more test cases for the beam_receive module to ensure that
all lines are still covered.
|
|
A test instruction with the same target label as jump immediately
followed it was supposed to be removed, but it was kept anyway.
Fix that optimization, but also make sure that the test instruction
is kept if the test instruction may have side effects (such as
a bit syntax matching instruction).
While at it, make the code cleaner by breaking it up into two clauses
and don't remove the jump instruction if it is redudant (removal of
redundant jump instructions is already handled in another place).
|
|
|
|
|
|
* bg/compiler-remove-r11-support:
compiler: Don't support the no_binaries option
erts: Don't support the put_string/3 instruction
compiler: Don't support the no_constant_pool option
compiler: Don't support the r11 option
test_server: Don't support communication with R11 nodes
binary_SUITE: Don't test bit-level binary roundtrips with R11 nodes
erts: Test compatibility of funs with R12 instead of R11
OTP-8531 bg/compiler-remove-r11-support
|
|
The no_constant_pool option was implied by the r11 option. It turns
off the usage of the constant (literal) pool, so that BEAM
instructions that use constants can be loaded in an R11 system.
Since the r11 option has been removed, there is no need to
retain the no_constant_pool option.
|
|
|