Age | Commit message (Collapse) | Author |
|
eb0b8da6e816 started to use a binary instead of a string in
bs_match_string instructions.
Remove a clause that attempts to handle the old form of
bs_match_string from old .S files. This is pointless, because an old
.S file is likely to contain a bs_context_to_binary instruction as
well. The bs_context_to_binary is no longer recognized by
`beam_validator`, so those old .S files will not work anyway.
|
|
There are two instructions that take string operands:
{bs_put_string,Fail,NumberOfBytes,{string,String}}
{bs_match_string,Fail,Register,NumberOfBits,{string,String}}
In the canonical BEAM code that is passed to beam_asm, string String
is currently represented as a list. (The string in bs_match_string is
a bitstring before the beam_z compiler pass.) That is wasteful,
because there will be unnecessary conversions between lists and
binaries.
Change the representation of String to be a binary.
Furthermore, bs_put_string is an optimization of a bs_put_binary
instruction with a literal binary operand. Currently, the
bs_put_string instruction is introduced in beam_bs. Delay the
introduction of bs_put_string to the beam_z pass. That will simplify
optimizations and allow us to do the optimization currently done
in beam_bs in a SSA pass in a future commit.
|
|
Moving away this optimization makes beam_block do one thing
and one thing only -- creating blocks.
|
|
The 'move' instruction can be eliminated in code such as:
{test,is_eq_exact,{f,42},[{x,0},{atom,value}]}.
{move,{atom,value},{x,0}}.
Move that optimization from beam_dead to beam_a. The optimization
will be simpler because the 'move' instruction has not yet
been moved into a block. Getting rid of 'move' earlier will
also save work for later passes.
Also move the optimization that eliminates instructions such
as from beam_dead to beam_a:
{test_is_eq_exact,{f,42},[{x,0},{x,0}]}.
|
|
v3_codegen is replaced by three new passes:
* beam_kernel_to_ssa which translates the Kernel Erlang format
to a new SSA-based intermediate format.
* beam_ssa_pre_codegen which prepares the SSA-based format
for code generation, including register allocation. Registers
are allocated using the linear scan algorithm.
* beam_ssa_codegen which generates BEAM assembly code from the
SSA-based format.
It easier and more effective to optimize the SSA-based format before X
and Y registers have been assigned. The current optimization passes
constantly have to make sure no "holes" in the X register assignments
are created (that is, that no X register becomes undefined that an
allocation instruction depends on).
This commit also introduces the following optimizations:
* Replacing of tuple matching of records with the is_tagged_tuple
instruction. (Replacing beam_record.)
* Sinking of get_tuple_element instructions to just before the first
use of the extracted values. As well as potentially avoiding
extracting tuple elements when they are not actually used on all
executions paths, this optimization could also reduce the number
values that will need to be stored in Y registers. (Similar to
beam_reorder, but more effective.)
* Live optimizations, removing the definition of a variable that is
not subsequently used (provided that the operation has no side
effects), as well strength reduction of binary matching by replacing
the extraction of value from a binary with a skip instruction. (Used
to be done by beam_block, beam_utils, and v3_codegen.)
* Removal of redundant bs_restore2 instructions. (Formerly done
by beam_bs.)
* Type-based optimizations across branches. More effective than
the old beam_type pass that only did type-based optimizations in
basic blocks.
* Optimization of floating point instructions. (Formerly done
by beam_type.)
* Optimization of receive statements to introduce recv_mark and
recv_set instructions. More effective with far fewer restrictions
on what instructions are allowed between creating the reference
and entering the receive statement.
* Common subexpression elimination. (Formerly done by beam_block.)
|
|
|
|
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
The annotations in the optimizing passes currently looks like this:
{'%live',NumRegistersUsed,RegistersUsedBitmap}
{'%def',RegistersDefinedBitmap}
(NumRegistersUsed is no longer used.)
When I attempted to extend some optimizations, I found that I had to
add additional clauses to tolerate/handle both types of
annotations. That problem would only get worse if any more annotations
are added in the future.
To simplify annotation handling, this commit wraps both types of
annotations in a {'%anno',_} tuple:
{'%anno',{used,RegistersUsedBitmap}}
{'%anno',{def,RegistersDefinedBitmap}}
The '%live' annotation has been renamed to 'used' to make it somewhat
clearer what it means, and the unused NumRegistersUsed part of the
old annotation has been removed.
Alternatives considered: My first attempt was to wrap the annotation
in a 'set' tuple so that there would only be 'set' tuples in a block.
For example:
{set,[],[],{anno,{live,RegistersUsedBitmap}}}
It was not as convenient as expected. Annotations often need to be
handled specially from other instructions in a block. When they are
wrapped in a 'set' tuple, they can very easily be handled incorrectly
or passed on to the next pass. That causes subtle errors or worse
code, and it can be difficult to debug.
Therefore, my conclusion is that annotations should be distinct from
other instructions, to make it obvious when one have missed to handle
an annotation.
|
|
|
|
|
|
|
|
|
|
The actual bs_match_string instruction has four operands:
bs_match_string {f,Lbl} Ctxt NumBits {string,ListOfBytes}
However, v3_codegen emits a more compact representation where
the bits to match are packaged in a bitstring:
bs_match_string {f,Lbl} Ctxt Bitstring
Currently, beam_clean:clean_labels/1 will rewrite the compact
representation to the final representation. That is unfortunate
since clean_labels/1 is called by beam_dead, which means that
the less compact representation will be introduced long before
it is actually needed by beam_asm. It will also complicate any
optimizations that we might want to do.
Move the rewriting of bs_match_string from beam_clean:clean_labels/1
to the beam_z pass, which is the last pass executed before
beam_validator and beam_asm.
|
|
As a preparation for fixing a bug, introduce a complete register
map in the '%live' annotations.
|
|
* Combine multiple get values with one instruction
* Combine multiple check keys with one instruction
|
|
|
|
|
|
The clause was formerly commented-out because at this point in the code,
no bs_put_string instruction has been generated yet when compiling from
Erlang.
If an Erlang module is compiled to BEAM assembly and the result contains
a bs_put_string instruction, the output can't be compiled to binary
anymore and the compiler crashes with the following error:
$ erlc prs.S
Function: compress/1
prs.S:none: internal error in beam_block;
crash reason: {{case_clause,
{'EXIT',
{function_clause,
[{beam_utils,live_opt,
[[{bs_put_string,1,{string,[0]}},
{bs_init,
{f,0},
{bs_append,0,8,{field_flags,[]}},
0,
[{integer,8},{x,0}],
{x,1}},
{label,2}],
2,
{1,{1,1,nil,nil}},
[{block,
[{'%live',2},
{set,[{x,0}],[{x,1}],move},
{'%live',1}]},
return]],
[{file,"beam_utils.erl"},{line,639}]},
{beam_utils,live_opt,1,
[{file,"beam_utils.erl"},{line,205}]},
{beam_block,function,2,
[{file,"beam_block.erl"},{line,38}]},
{lists,mapfoldl,3,
[{file,"lists.erl"},{line,1329}]},
{beam_block,module,2,
[{file,"beam_block.erl"},{line,29}]},
{compile,'-select_passes/2-anonymous-2-',2,
[{file,"compile.erl"},{line,476}]},
{compile,'-internal_comp/4-anonymous-1-',2,
[{file,"compile.erl"},{line,276}]},
{compile,fold_comp,3,
[{file,"compile.erl"},{line,294}]}]}}},
[{compile,'-select_passes/2-anonymous-2-',2,
[{file,"compile.erl"},{line,476}]},
{compile,'-internal_comp/4-anonymous-1-',2,
[{file,"compile.erl"},{line,276}]},
{compile,fold_comp,3,[{file,"compile.erl"},{line,294}]},
{compile,internal_comp,4,[{file,"compile.erl"},{line,278}]},
{compile,'-do_compile/2-anonymous-0-',2,
[{file,"compile.erl"},{line,152}]}]}
|
|
Somewhat reduce the code bloat by eliminating special cases.
|
|
Somewhat reduce code bloat.
|
|
Eliminate some code bloat.
|
|
Rewrite the five binary creation instructions to a bs_init
instruction, in order to somewhat reduce code bloat.
|
|
We can remove some code bloat by handling the special instructions
as BIF instructions in the optimization passes. Also note that
bs_utf*_size was not handled by beam_utils:check_liveness/3
(meaning the conservative answer instead of the correct answer
would be returned).
|
|
Seven bs_put_* instructions can be combined into one generic bs_put
instruction to avoid some code bloat. That will also improve some
optimizations (such as beam_trim) that did not handle all bs_put*
variants.
|
|
Since we always want to remove unused labels directly after code
generation (whether we'll run the optimization passes or not),
we can simplify the code by doing it in beam_a.
|
|
Introduce the mandary beam_a pass that will be run directly after code
generation, and the mandatory beam_z pass that will be run just before
beam_asm. Since these passes surround the optimizations, beam_a can
(for example) do instruction renaming to simplify the optimization
passes and beam_z can undo those renamings.
|