Age | Commit message (Collapse) | Author |
|
* egil/maps/hamt/OTP-12585: (113 commits)
erts: Fix bug in ESTACK and WSTACK
kernel: Add spec for erts_debug:map_info/1
mnesia: Update mnesia tests to reflect new ETS hash
erts: Ensure maps uses _rel functions in halfword
erts: Do not treat errors as fatal in erl_printf_term
erts: Update preloaded erts_internal.beam
erts: Add map decomposition wrappers
erts: Ensure halfword has correct temp-heap for maps
hipe: Handle separate hashmap tag correctly
erts: Fix map bug in dec_term for 32-bit debug VM
stdlib: Update qlc tests to reflect new ETS hash
stdlib: Remove obsolete hashmap references in io_lib
erts: Enhance maps ordering tests
hipe: Fix maps sort order testcase
erts: Remove unused variable in crashdump creation
erts: Fix typo in copy_struct for halfword emulator
erts: Restrict GCC intrinsics by compiler version
erts: Fix windows bug in hashmap_info
erts: Fix typo in make_hash2 for 32-bit arch
Fix beam_load assert
...
Conflicts:
erts/emulator/beam/bif.tab
|
|
where key 1 is less than key 1.0
|
|
For unclear reasons, there are two functions in v3_life that are
almost identical: literal/2 and literal2/2. literal/2 is used
for expressions and literal2/2 for patterns.
It turns out that literal2/2 can do everything that literal/2 can
do, except that it transforms maps differently.
If we adjust v3_codegen to accept the same format of maps in
expressions and patterns, we can rename literal2/2 to literal/2
and use it for expressions and patterns.
|
|
v3_codegen puts the compilation in the process dictionary, but
never uses them.
|
|
Inlining the core_parse module is slow (the inline pass alone
takes more than 6 seconds on my computer) and has no benefit.
|
|
Duplicated variables as aliases in patterns, such as:
f({_,_}=Dup=Dup) -> ...
will work, but produce sub-optimal code similar to:
f({_,_}=Dup=NewVar) when Dup =:= NewVar -> ...
with one extra guard test for each duplicated variable.
Rewrite pat_alias/2 to eliminate all duplicated variables. While
we are at it, also simplify handling of tuples, conses, and literals
by using the data functions in the cerl module.
|
|
|
|
|
|
|
|
initialized_regs/2 did not handle allocating instructions;
instead treating them as any other 'set' instruction.
The consequences could be one or both of the following:
Going past the allocating instruction (looking at more instructions)
would mean that initialized_regs/2 could return registers that were
not actually initialized. That could mean that MustBeKilled in
ensure_opt_safe/6 could contain too few registers, and that the code
that followed tried to use an uninitialized register. The
beam_validator should have detected that problem.
Not taking account the number of live registers in the
allocating instruction could mean that some registers
were not found to be initialized, which could mean that
MustBeKilled would contain too many registers. That would
mean a missed optimization.
|
|
For a long time, there has been an optimization for:
{V1,V2,...} = case Expr of Pat -> ... {Val1,Val2,...}; ... end
that avoids building the tuples. The construct looks like this
in Core Erlang:
let <V> = case X of
Pattern -> {Y,Z}
end
in case V of
{A,B} -> A+B
end
The current optimization will try to replace the second 'case'
with a 'let':
let <A,B> = case X of
Pattern -> <Y,Z>
end
in A+B
Simple variations of the construct would prevent the optimizations;
for example this one:
let <V> = case X of
Pattern -> {'ok',Val}
end
in case V of
{ok,Val} -> Val
end
The problem is that the optimization tries to do too much. By
making the optimization do less and have it depend on other
optimizations to finish the job, it will become more powerful.
Thus we can rewrite the code like this:
let <V1,V2> = case X of
Pattern -> <'ok',Val>
end
in let <V> = {V1,V2}
in case V of
{ok,Val} -> Val
end
Note that the second case is unchanged. The other optimizations in the
sys_core_fold module will optimize the second 'case' and eliminate the
building of the tuple.
|
|
Optimize away 'not' in sys_core_fold instead of in beam_block
and beam_dead, as we can do a better job in sys_core_fold.
I modified the test suite temporarily to never turn off Core Erlang
modifications and looked at the coverage. With the new optimizations
active in sys_core_fold, the code in beam_block and beam_dead did not
find a single 'not' that it could optimize. That proves that the new
optimization is at least as good as the old one. Manually, I could
also verify that the new optimization would optimize some variations
of 'not' that the old one would not handle.
|
|
More aggressive optimizations that we plan to introduce could cause
spurious compiler warnings.
|
|
The 'try' ... 'catch' is problematic. Firstly, if no optimization
is possible, an exception will always be thrown. Secondly, bugs
in the code will go unnoticed.
|
|
'=:=' is a cheaper operation than '==', so we should always
use '=:=' if the result will be the same as if '==' were used.
|
|
|
|
Make sure that we take extract all possible type information when
optimizing a 'let' construct.
Since the stronger optimization may generate false warnings, we also
need to take special care to suppress false warnings.
|
|
The put_map_assoc and put_map_exact instructions in the run-time
system will support that the target register is the same as one of
the source registers. Teach the code generator to take advantage
of that.
The disadvantages of not reusing register when possible is that the
garbage collector may retain dead terms longer than necessary.
|
|
|
|
get_ianno/1 would retrieve either a bare annotation or an
annotation wrapped in an #a{} record. In both cases, it would
return a wrapped annotation.
We can replace the calls to get_ianno/1 with calls to get_anno/1,
because the argument is always an #iclause{} and all iclause records
are always initialized with a wrapped annotation.
|
|
If we have a sequence of put_map_* instructions operating on the
same map, it will be more efficient if we can have one is_map/2
instruction before put_map_* instructions, so that each put_map_*
does not need to test whether the argument is a map.
|
|
When calculating the number of live registers for allocation
instruction, it is not always safe to start with the number of live
registers at the start of the block. We will need to use the register
map to know whether there are any holes (dead registers) that are not
subsequently filled.
If the allocation instruction already has a number of live registers
calculated, there is nothing to be gained by raising it.
|
|
As a preparation for fixing a bug, introduce a complete register
map in the '%live' annotations.
|
|
|
|
|
|
* bjorn/compiler/beam_jump-share:
beam_jump: Don't jump into the middle of a 'try'
|
|
* bjorn/compiler/sys_core_fold:
sys_core_fold: Fix non-tail-recursive list comprehensions
|
|
* bjorn/compiler/beam_jump:
beam_jump: Eliminate pathologically slow compilation
|
|
* bjorn/compiler/beam_validator:
beam_validator: Exit immediately on crashes
beam_validator: Remove the file/1 and files/1 functions
beam_validator: Remove support for all other unsupported instructions
beam_validator: Remove support for unsupported bit syntax instructions
|
|
649d6e73 simplified opt_simple_let_2/6 a little bit too much,
so that some list comprehensions in effect context were not
properly tail-recursive.
|
|
José Valim noticed that code such as:
match(1) -> 1;
match(2) -> 2;
match(3) -> 3;
...
match(1000) -> 1000.
would compile very slowly. The culprit is opt/3 in beam_jump.
What happens is that opt/3 will rewrite this code:
select_val ...
label 1
jump 1000
label 2
jump 1000
...
label 999
jump 1000
label 1000
return
very slowly to this code:
select_val ...
label 1
label 2
...
label 999
label 1000
return
The reason for the slowness is that when opt/3 sees this
sequence:
label 1
jump 1000
...
it will remove the label (storing it in a dictionary),
and pick up the previously processed instruction from
the accumulator:
select_val ...
jump 1000
label 2
jump 1000
...
That is done in order to process all labels before the
jump and also to get rid of the jump instruction if the
previous instruction is an "unreachable after". In this
case, re-processing the sequence will remove the now
unreachable jump instruction:
select_val ...
label 2
jump 1000
...
The problem is that re-processing the select_val instruction is
expensive. The instruction has a list of 1000 labels, all of which
will be added (again) to the set of referenced labels. The
select_val instruction will be re-processed again and again
until all labels and jumps have been gobbled up.
In the original version of beam_jump, opt/3 was not called
repeatedly until a fixpoint was found, but was expected to do
all its optimizations in one pass. The fixpoint iteration was
added later.
Since we now have the fixpoint iteration, there is no need
to do everything in a single pass. When we encounter a jump, we will
collect all previously seen labels and put them into the dictionary,
and then we will move on.
As a further optimization, we will look for sequences like this:
jump X
label ...
jump X
and replace them with:
label ...
jump X
In the example above, that will avoid 1000 updates of the dictionary.
After applying this optimization, compilation of the
pattern went from roughly 55 s to 0.1 s for the example
above but with 10000 clauses.
Reported-by: José Valim
|
|
The code sharing optimization could produce a jump into the
middle of a 'try' block. beam_validator would reject the code.
Reported-by: Ulf Norell
|
|
The beam_validator catches all exceptions and collect them.
It makes more sense to don't catch 'error' and 'exit' exceptions,
but to just print out the name of the current function and pass
on the exception just as all other compilation passes do. Those
kind of exceptions are the symptoms of the kind of severe but
easily catched bugs that occur during development.
|
|
Before the beam_validator was added as compiler pass, it was a
standalone module that could analyse existing .beam files and .S
files.
Even though beam_validator has been part of the compiler for many
releases, it still supports the analysis of .beam and .S files.
To reduce the code bloat and to improve coverage of beam_validator,
remove the file/1 and files/1 functions and all associated help
functions. We'll need to update the test suite, since some of the
checked in .S files have errors that beam_validator ignores, but
that will not be accepted when running them throught the compiler
using the 'from_asm' option. In particular, we will need to export
all functions that should be validated (since the beam_clean pass
will remove any function that is not possible to call).
|
|
|
|
|
|
The assert_strict_literal_termorder/1 function is used to validate the
get_map_elements and has_map_fields instructions. In neither case is
it useful to allow an empty lists of fields, so we should no longer
allow an empty list.
The mmap/2 function is cute, but it is used in only one place, so it
is much simpler to write a special-purpose function to extract the
keys from the list of map pairs.
|
|
The has_map_fields test was not recognized in is_pure_test/1,
because beam_a has rewritten the {list,_} part of instruction.
|
|
|
|
|
|
|
|
There is no need to always introduce a new variable to hold a map.
Maps are novars (constructs that don't export variables).
|
|
In cd1eaf0116190, opt_simple_let_2/6 was updated to do the same
optimizations in 'value' and 'effect' context.
Coalesce the clauses for 'value' and 'effect' context to one to
make it clear that they do the same thing.
|
|
The code for inlining high-order functions from the lists module
is quite annoying when you try to navigate the sys_core_fold
module. Break out the code into its own module.
|
|
Those functions allow us to clean up some more code.
|
|
Introduce access functions to hide the low-level details of how
type information is implemented.
|
|
|
|
Essentially, core_lib:literal_value/1 became useless when literals
were introduced in R12. Since we always create #c_literal{} records
whenever possible, literal_value/1 would *only* succeed when it was
passed a #c_literal{} argument.
|
|
We are about to deprecate core_lib:get_anno/1 and core_lib:set_anno/2,
so we should stop using them in the compiler.
|
|
Attributes must be literals. Since 1fcdcd50, both core_parse and
v3_core guarantees all Core Erlang terms that may be represented as
literals in fact are represented as literals.
Therefore, we no longer need to call core_lib:is_literal/1, but
can test for a #c_literal{} directly.
|