Age | Commit message (Collapse) | Author |
|
Fix unsafe optimization of record test
|
|
beam_record would make an unsafe optimization for the
not_used_p/4 function added to beam_utils_SUITE in this
commit. The bug is in beam_utils, which would falsely
report that {x,4} was unused when it in fact was used.
The bug was in the function not_used/1. The purpose of
not_used/1 is to return a 'not_used' result unless the
actual result is 'used'. Unfortunately it was not
implemented in that way. It would let a 'transparent'
result slip through, which the caller in this case would
convert to 'killed' (because the register was killed on
all other paths).
Reported-by: Richard Carlsson
|
|
Compile external fun expressions to literals
OTP-15003
|
|
bjorng/bjorn/compiler/fix-atom-leak/ERL-563/OTP-14968
Stop the compiler from overflowing the atom table
|
|
The expressions fun M:F/A, when all elements are literals are also
treated as a literal. Since they have consistent representation and
don't depend on the code currently loaded in the VM, this is safe.
This can provide significant performance improvements in code using such
functions extensively - a full function call to erlang:make_fun/3 is
replaced by a single move instruction and no register shuffling or
saving registers to stack is necessary. Additionally, compound data
types that contain such external functions as elements can be treated as
literals too.
The commit also changes the representation of external funs to be a
valid Erlang syntax and adds support for literal external funs to core
Erlang.
|
|
Use integer variable names instead of atoms in v3_core, sys_core_fold,
and v3_kernel to avoid overflowing the atom table.
It is a deliberate design decision to calculate the first free integer
variable name (in sys_core_fold and v3_kernel) instead of somehow
passing it from one pass to another. I don't want that kind of
dependency between compiler passes. Also note that the next free
variable name is not easily available after running the inliner.
|
|
|
|
The way variables created by make_template() are used, it is necessary
that the names are unique in the entire function. This has not
happened to cause any problems in the past because all other compiler
passes created atom variable names, not integer variable names. If
other passes start to create integer variable names, this bug is
exposed.
|
|
During compilation, the bs_save2 and bs_restore2 instructions contain a match
context reference. That reference is the variable name that holds the match context.
beam_clean assumes that the reference always is an atom, which is not a safe assumption
since integers are legal variable names in Core Erlang.
|
|
When a generator in a list comprehension was given some
other term than a list, the wrong line could be pointed
out in the exception. Here is an example:
bad_generator() ->
[I || %%This line would be pointed out.
I <- not_a_list].
https://bugs.erlang.org/browse/ERL-572
|
|
A literal negative size in binary construction would cause a crash.
|
|
The missing support for renumbering labels in recv_mark
and recv_set did not seem to cause any problems, probably because
the insructions are introduced late and their labels would keep
their numbers. But it there will definitely be a problem if the
recv_mark and recv_set instructions would be introduced much earlier.
|
|
For unclear reasons, v3_kernel attempts to guarantee that #k_try{}
always has at least one return value, even if it will never be
used. I said "attempts", because the handler block that is executed
when an exception is caught does not have the same guarantee. That
means that if an exception is thrown, the return value will not
actually be set.
In practice, however, this is not a problem for the existing code
generator (v3_codegen). The generated code will still be safe.
If we are to rewrite the code generator to generate an SSA-based
intermediate format, this inconsistency *will* cause problems
when creating phi nodes.
While at it, also remove an unecessary creation of new variables
in generation of #k_try_enter{}.
|
|
|
|
Conflicts:
OTP_VERSION
|
|
* hasse/dialyzer/extra-range/OTP-14970:
ssl: Correct some specs
os_mon: Correct a spec
Fix broken spec in beam_asm
Dialyzer should not throw away spec information because of overspec
|
|
|
|
|
|
Refactor and fix minor bugs in beam_type
|
|
Fix beam_utils bugs that could cause problems in the future
|
|
Make sure to kill all dependencies when a register is killed. For example,
in the following code, when the type information for {x,0} is killed in
the last instruction, there will still be type information for {x,1} referring
to {x,0}:
{get_tuple_element,{x,0},0,{x,1}}.
{test,is_eq_exact,{f,5},[{x,1},{atom,tag}]}.
{get_tuple_element,{x,0},1,{x,2}}.
{get_tuple_element,{x,0},2,{x,0}}.
This does not seems to have caused any problems in the past, but it may
cause problems in the future with a register allocator that reuses register
more aggressively.
|
|
|
|
The function tdb_update/2 is problematic. It does not distinguish
between assigning a register with a new value and updating information
about a register that is used a as source in a test instruction.
That was not a problem in practice when there were very few types,
but bugs started to be noticed as more types were added. (For example,
when a register was overwritten with a new value, the type for the
old value stored in the same register could linger in some cases.)
Introduce separate functions tdb_store/3 and tdb_meet/3 for assigning
a new value to a register and for updating type information for a
register referenced as as source, respectively. Also stricten
verification of the types that gets stored into the type database.
|
|
In the following code:
{get_tuple_element,{x,0},0,{x,1}}.
{put_tuple,2,{x,1}}.
{put,{atom,badmap}}.
{put,{x,0}}.
{move,{x,1},{x,0}}.
beam_block would move the get_tuple_element/3 instruction and eliminate the
move/2 instruction:
{put_tuple,2,{x,1}}.
{put,{atom,badmap}}.
{put,{x,0}}.
{get_tuple_element,{x,0},0,{x,0}}.
That is not correct, since the result of the tuple building in {x,1} is
now ignored.
|
|
live_opt_block/4 could overestimate the number of live
registers for a GC BIF and trigger an assertion. This does not
seem to be a problem when generating code using v3_codegen,
but only when using a new experimental code generator, therefore
there is no need include this correction in maint.
|
|
is_killed/3 and is_killed_at/3 could return 'true' even if the
register was referenced by an allocation instruction. Somehow, that
does not seem to have caused any problems yet.
|
|
Every catch or try/catch must use a lower Y register number than any
enclosing catch or try/catch. That will ensure that when the stack
is scanned when an exception occurs, the innermost try/catch tag is
found first.
|
|
|
|
1a029efd1ad47f started to run the beam_block pass a second time,
but it did not attempt to combine adjacent blocks.
Combining adjacent blocks leads to many more opportunities for
optimizations.
After doing some diffing in generated code, it turns out that
there is no benefit for beam_split to split out line instructions
from blocks. It seems that the only reason it was done was to
slightly simplify the implementation of the no_line_info option
in beam_clean.
|
|
As a preparation for combining blocks before running beam_block
for the second time, disable CSE for floating point operations
because it will generate invalid code.
|
|
* maint:
Check that the stack is initialized when an exception may occur
|
|
The more aggressive optimizations of 'allocate_zero' introduced
in cb6fc15c35c7e could produce unsafe code such as the following:
{allocate,0,1}.
{bif,element,{f,0},[{integer,1},{x,0}],{x,0}}.
The code is not safe because if element/2 fails, the runtime
system may scan the stack and find garbage that looks like a
catch tag, and would most probably crash.
Fix the problem by making beam_utils:is_killed/3 be more conservative
when asked whether a Y register will be killed.
Also fix an unsafe move upwards of an allocation instruction
in beam_block.
|
|
Strengthen beam_validator to check that the stack is initialized
when an instruction with an {f,0} operand is executed.
For example, the following code sequence:
{allocate,0,1}.
{bif,element,{f,0},[{integer,1},{x,0}],{x,0}}.
should not be accepted because the stack may be scanned if
element/2 fails. That could cause a crash or other undefined
behavior if garbage on the stack looks like a catch tag.
|
|
Eliminate get_list/3 internally in the compiler
|
|
Fix incorrect handling of floating point instructions
|
|
* maint:
Fix incorrect type interference of integer ranges
Conflicts:
lib/compiler/src/beam_type.erl
|
|
1a029efd1ad47f started to run the beam_block pass a second time.
Since it is run after introduction of the optimized floating point
instructions, it must handle those instructions correctly.
In particular, it must be careful when hoisting allocation
instructions. For example, the following code:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{fmove,{fr,2},{x,0}}.
{allocate_zero,1,4}.
must not be rewritten to:
{test_heap,{alloc,[{words,0},{floats,1}]},5}.
.
.
.
{allocate_zero,1,4}.
{fmove,{fr,2},{x,0}}.
because beam_validator will not consider it safe. (The code may
actually be safe depending on what the code between the two allocation
instructions do.)
https://bugs.erlang.org/browse/ERL-555
|
|
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
misc_SUITE:integer_encoding/1 was written to make sure
that big integers were encoding correctly in a reasonable
amount of time. Now that beam_asm will encode big integers
as literals, we can reduce the scope of integer_encode/1.
That will make it significantly faster, especially when
cover is running.
|
|
Numbers that clearly are not smalls can be encoded as
literals. Conservatively, we assume that integers whose
absolute value is greater than 1 bsl 128 are bignums and
that they can be encoded as literals.
Literals are slightly easier for the loader to handle than
huge integers.
|
|
Do local common sub expression elimination (CSE)
|
|
Optimize away unnecessary test_unit instructions that verify that
binaries are byte-aligned. In a tight loop, eliminating an
instruction can have a small but measurable improvement of the
execution time.
|
|
Separate the simplification of instructions from updating of the
type data base.
|
|
Extend an existing optimization in beam_dead to avoid
creating a match context when matching an empty binary.
|
|
Eliminate repeated evaluation of guard BIFs and building of cons cells
in blocks. This optimization is applicable in more places than might be
expected, because code generation for binaries and record can generate
common sub expressions not visible in the original source code.
For example, consider this function:
make_binary(Term) ->
Bin = term_to_binary(Term),
Size = byte_size(Bin),
<<Size:32,Bin/binary>>.
The compiler inserts a call to byte_size/2 to calculate the size of
the binary being built:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}. %Present in original code.
{line,...}.
{gc_bif,byte_size,{f,0},2,[{x,0}],{x,2}}. %Inserted by compiler.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Common sub expression elimination (CSE) eliminates the second call to
byte_size/2:
{function, make_binary, 1, 2}.
{label,1}.
{line,...}.
{func_info,{atom,t},{atom,make_binary},1}.
{label,2}.
{allocate,0,1}.
{line,...}.
{call_ext,1,{extfunc,erlang,term_to_binary,1}}.
{line,...}.
{gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}.
{move,{x,1},{x,2}}.
{bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}.
{bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}.
{bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}.
{bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}.
{move,{x,2},{x,0}}.
{deallocate,0}.
return.
Note: A possible future optimization would be to include binary
construction instructions in blocks. If that is done, the
{move,{x,1},{x,2}} instruction could also be eliminated.
|
|
The folling sequence in a block:
{move,{x,1},{x,2}}.
{move,{x,2},{x,2}}.
would be incorrectly rewritten to:
{move,{x,2},{x,2}}.
(Which in turn would be optimized away a little bit later.)
|
|
When attempting to eliminate the move/2 instruction in the following
code:
{bif,self,{f,0},[],{x,0}}.
{move,{x,0},{x,1}}.
.
.
.
{put_tuple,2,{x,1}}.
{put,{atom,ok}}.
{put,{x,0}}.
beam_block would produce the following unsafe code:
{bif,self,{f,0},[],{x,1}}.
.
.
.
{put_tuple,2,{x,1}}.
{put,{atom,ok}}.
{put,{x,1}}.
It is unsafe because the tuple is self-referential.
The following code:
{put_list,{y,6},nil,{x,4}}.
{move,{x,4},{x,5}}.
{put_list,{y,1},{x,5},{x,5}}.
.
.
.
{put_tuple,2,{x,6}}.
{put,{x,4}}.
{put,{x,5}}.
would be incorrectly transformed to:
{put_list,{y,6},nil,{x,5}}.
{put_list,{y,1},{x,5},{x,5}}.
.
.
.
{put_tuple,2,{x,6}}.
{put,{x,5}}.
{put,{x,5}}.
(Both elements in the built tuple get the same value.)
|
|
Make sure that there is the correct number of put/1 instructions
following put_tuple/2. Also make it illegal to reference the
register for the tuple being built in a put/1 instruction.
That is, beam_validator will now issue a diagnostice for the the
following code:
{put_tuple,1,{x,0}}.
{put,{x,0}}.
|
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|