Age | Commit message (Collapse) | Author |
|
Most x86 passes were either linearise(pass(to_cfg(Code))) or trivially
rewritable to process a CFG. This saves a great deal of time and memory
churn when compiling large programs.
Now, there will only ever be a single Linear->CFG conversion, just after
lowering from RTL, and only ever a single CFG->Linear conversion, just
before the finalise pass. Both of these now happen in hipe_x86_main.
|
|
These options would not do anything, because they would not supress the
'o2' in ?COMPILE_DEFAULTS. Such behaviour is added to expand_options/2.
|
|
Now that x86 is no longer broken with these optimisation levels, we add
them to the test suite to ensure they do not break again.
Bump timeout to 6min since tests are run twice as many times.
The option set of o1 was changed to all optimisations that run fast on
both big and small programs, incurring only a slight compile time
increase compared to the old set, but with a, presumably, significant
improvement to speed of compiled code.
Change o0 register allocator to linear_scan.
|
|
Immediate arguments to get_word_integer/4 would lead to bad but
unreachable RTL being generated. We omit its generation by testing for
immediates and performing the logic at compile time.
|
|
The x86 backend crashes if certain RTL optimisations were omitted,
preventing it from being usable at lower optimisation levels.
|
|
There is little point offering LSRA for x86 if we're still going to call
hipe_graph_coloring_regalloc for the floats. In particular, all
allocators except LSRA allocates an N^2 interference matrix, making them
unusable for really large functions.
|
|
|
|
Register allocation could transform something like
fmove u32, d99
to
fmove $rdx, 0x20($rsp)
which is an invalid instruction.
|
|
Since the link register/return address is restored before stack
arguments are stored to the frame, we must not use it to store a stack
argument. We do that by adding it to the registers clobbered by
pseudo_tailcall_prepare.
|
|
The problem was caused by shift-by-immediate-zero, which wraps to
immediate-32 with some shiftops. TODO: Someplace should be modified to
crash when these are generated so debugging further instances of this
gets easier in the future.
|
|
|
|
|
|
|
|
The 'array' module is highly optimised for the hipe_vectors use-case,
and seems to perform slightly better than the gb_trees implementation.
Also, we remove the completely unnecessary hipe_vectors.hrl header.
|
|
|
|
|
|
Slightly improves performance.
|
|
|
|
Also, remove unused field 'counter' from #state{}.
|
|
|
|
Profiling showed that hipe_sdi spent most of its time in updateParents,
discarding nodes that were already deleted. By introducing a delete
operation to the segment trees, we can pay this cost only once, when
deleting the node from the graph.
Instead of keeping the ranges around, we recompute the range of the node
when we delete it, since this can be done in constant time, without any
memory allocation.
Although segment trees are not designed to be modified once built,
implementing a delete operation turned out to be a simple matter of
repeating insertion, but deleting the index from, instead of consing it
on, the appropriate nodes' values (segment lists).
This optimisation drastically sped up hipe_sdi to the point of no longer
being the bottleneck in the Assembly stage.
|
|
This speeds up parentsOfChild/2 from O(n) to O(lg n + k).
A new module misc/hipe_segment_trees.erl is introduced.
|
|
hipe_icode_bincomp:find_bs_get_integer/3 was quadratic for no good reason. By observing
that NewSuccs and Rest are always disjoint, we can see that the worklist
does not need to be a set. Furthermore, by replacing the ordset Visited
with a map, we reduce complexity to (a very low) O(n lg n).
On cuter_binlib, this change reduced the time for hipe_icode_bincomp
from 60s to .25s. Using a gb_set for Visited gives .5s, and a sets:set
1s.
We apply the same optimisation to hipe_icode_range.
|
|
t_map/3 previously required callers to perform this normalisation, but
as t_from_form/5 would sometimes fail to do so, this requirement is
relaxed.
Bug (ERL-177) reported and shrunk by Luke Imhoff.
|
|
|
|
The Core Erlang pattern matching compiler was written long ago, at
a time when binaries and bistrings did not really exist in Erlang.
This patch, taken from the code of CutEr where it's used for more
than a year now, extends the transformation for pattern matching
compilation to also include binaries and bistrings.
Some code that was found erroneous and causes errors when compiling
the transformed code to native code was also taken out while at it.
Thanks to @aggelgian for most of the changes in the code.
|
|
* hasse/dialyzer/improve_from_form/OTP-13547:
Update primary bootstrap
stdlib: Correct types and specs
dialyzer: Minor adjustments
dialyzer: Suppress unmatched_return for send/2
dialyzer: Improve the translation of forms to types
dialyzer: Use a cache when translating forms to types
dialyzer: Prepare erl_types:t_from_form() for a cache
dialyzer: Optimize erl_types:t_form_form()
dialyzer: Correct types
syntax_tools: Correct types
erts: Correct character repr in doc of the abstract format
stdlib: Correct types and specs
|
|
It is possible that '...' is added later (OTP 20.0), but for now we
are not sure of all details.
|
|
Spend less of the limited resources on recursive types.
|
|
|
|
No change of functionality.
|
|
When the translation from forms to types exceeds some limit, it is
faster to try small depths first.
|
|
|
|
This reverts commit e020f75c10410a6943cd055bfa072a2641eab7da.
|
|
|
|
|
|
|
|
|
|
|
|
* Rewrite matching statements in ?when_option macro to form that silences
dialyzer's unmatched_return warnings
* Treat compiler warnings as errors when compiling files in main
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and correct the name of another, erroneously spelt, option in the process.
|
|
|
|
|
|
Removed in f9cb80861f169743 when changed impl from C to Erlang.
But seems they are needed to keep dialyzer tests happy.
Also improved bif_SUITE:shadow_comments to include all exported
in module erlang, not just the "snifs".
...which detected that apply/2 was missing Shadowed comment as well.
|