Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
by preventing it from doing GC, which generated code relies on.
|
|
HiPE has had metadata for gc safety on it's temporaries for a while, but
it has never been enforced or even checked, so naturally several
gc-safety violations has slipped through.
A new pass, hipe_rtl_verify_gcsafe verifies gcsafety on optimised RTL
and is used when running the testsuite, and can be manually enabled with
+{hipe,[verify_gcsafe]}.
|
|
|
|
by introducing new primop 'is_unicode'
with no exception (ab)use and no GC.
Replaces bs_validate_unicode which is kept for backward compat for now.
|
|
* maint:
erts, stdlib: Fix xmllint warning
Update runtime deps to depend on new stdlib functionality
|
|
~tw and new string functions are new since OTP-20 (stdlib-3.4)
|
|
Should probably be left for the HiPE team to fix
|
|
|
|
hipe_range_split is a complex live range splitter, more sophisticated
thatn hipe_restore_reuse, but still targeted specifically at temporaries
forced onto stack by being live over call instructions.
hipe_range_split partitions the control flow graph at call instructions,
like hipe_regalloc_prepass. Splitting decisions are made on a per
partition and per temporary basis.
There are three different ways in which hipe_range_split may choose to
split a temporary in a program partition:
* Mode1: Spill the temp before calls, and restore it after them
* Mode2: Spill the temp after definitions, restore it after calls
* Mode3: Spill the temp after definitions, restore it before uses
To pick which of these should be used for each temp×partiton pair,
hipe_range_split uses a cost function. The cost is simply the sum of the
cost of all expected stack accesses, and the cost for an individual
stack access is based on the probability weight of the basic block that
it resides in. This biases the range splitter so that it attempts moving
stack accesses from a functions hot path to the cold path.
hipe_bb_weights is used to compute the probability weights.
mode3 is effectively the same as what hipe_restore_reuse does. Because
of this, hipe_restore_reuse reuses the analysis pass of
hipe_restore_reuse in order to compute the minimal needed set of spills
and restores. The reason mode3 was introduced to hipe_range_split rather
than simply composing it with hipe_restore_reuse (by running both) is
that such a composition resulted in poor register allocation results due
to insufficiently strong move coalescing in the register allocator.
The cost function heuristic has a couple of tuning knobs:
* {range_split_min_gain, Gain} (default: 1.1, range: [0.0, inf))
The minimum proportional improvement that the cost of all stack
accesses to a temp must display in order for that temp to be split.
* {range_split_mode1_fudge, Factor} (default: 1.1, range: [0.0, inf))
Costs for mode1 are multiplied by this factor in order to discourage
it when it provides marginal benefits. The justification is that
mode1 causes temps to be live for longest, thus leading to higher
register pressure.
* {range_split_weight_power, Factor} (default: 2, range: (0.0, inf))
Adjusts how much effect the basic block weights have on the cost of a
stack access. A stack access in a block with weight 1.0 has cost 1.0,
a stack access in a block with weight 0.01 has cost 1/Factor.
Additionally, the option range_split_weights chooses whether the basic
block weights are used at all.
In the case that the input is very big, hipe_range_split automatically
falls back to hipe_restore_reuse only in order to keep compile times
under control. Note that this is not only because of hipe_range_split
being slow, but also due to the resulting program being slow to register
allocate, and is not as partitionable by hipe_regalloc_prepass.
hipe_restore_reuse, on the other hand, does not affect the programs
partitionability.
The hipe_range_split pass is controlled by a new option ra_range_split.
ra_range_split is added to o2, and ra_restore_reuse is disabled in o2.
|
|
hipe_bb_weights computes basic block weights by using the branch
probability predictions as the coefficients in a linear equation system.
This linear equation system is then solved using Gauss-Jordan
Elimination.
The equation system representation is picked to be efficient with highly
sparse data. During triangelisation, the remaining equations are
dynamically reordered in order to prevent the equations from growing in
the common case, preserving the benefit of the sparse equation
representation.
In the case that the input is very big, hipe_bb_weights automatically
falls back to a rough approximation in order to keep compile times under
control.
|
|
hipe_restore_reuse is a simplistic range splitter that splits temps that
are forced onto the stack by being live over call instructions. In
particular, it attempts to avoid cases where there are several accesses
to such stack allocated temps in straight-line code, uninterrupted by
any calls. In order to achieve this it splits temps between just before
the first access(es) and just after the last access(es) in such
straight-line code groups.
The hipe_restore_reuse pass is controlled by a new option
ra_restore_reuse.
ra_restore_reuse is added to o1.
|
|
|
|
|
|
|
|
* richarl/fix-license-headers/PR-788:
Make use of the Header feature in yecc
Remove Emacs timestamp markers
Remove obsolete CVS keyword markup
Update obsolete author e-mails
Replace broken example URL
Add missing entries to COPYRIGHT file
Correct copyright on stdlib files
Add license headers to hipe files in kernel
Correct copyright info on typer files
Correct copyright and license on dialyzer files
Correct copyright on remaining hipe files
Correct copyright info on hipe cerl files
Correct copyright info on cerl-related files
Correct the copyright info in docgen_edoc_xml_cb
Update Syntax Tools license headers
Remove some obsolete files
Update EDoc license headers to dual Apache2/LGPL
Update EUnit license headers to dual Apache2/LGPL
|
|
|
|
|
|
Print the MFA/module being compiled, and pretty-print the backtrace with
lib:format_stacktrace/4.
Additionally, make the error_msg/2 macro in hipe.hrl respect the
HIPE_LOGGING define, since messages produced by this macro just before
runtime shutdown were sometimes lost (since code_server:error_msg/2 is
asynchronous).
|
|
* rickard/time-unit/OTP-13831:
Replace usage of deprecated time units
|
|
ra_partitioned significantly speeds up register allocation of larger
functions without affecting allocation quality negatively. This is the
final change needed to make o1 suitable for compiling really large
functions without choking.
|
|
These will not only be useful for hipe_regalloc_prepass, but also, after
the introduction of a mk_move/2 (or similar) callback, for the purpose
of range splitting.
Since the substitution needed to case over all the instructions, a new
module, hipe_ppc_subst, was introduced to the ppc backend.
|
|
These will not only be useful for hipe_regalloc_prepass, but also, after
the introduction of a mk_move/2 (or similar) callback, for the purpose
of range splitting.
Since the substitution needed to case over all the instructions, a new
module, hipe_sparc_subst, was introduced to the sparc backend.
|
|
These will not only be useful for hipe_regalloc_prepass, but also, after
the introduction of a mk_move/2 (or similar) callback, for the purpose
of range splitting.
Since the substitution needed to case over all the instructions, a new
module, hipe_arm_subst, was introduced to the arm backend.
|
|
These will not only be useful for hipe_regalloc_prepass, but also, after
the introduction of a mk_move/2 (or similar) callback, for the purpose
of range splitting.
Since the substitution needed to case over all the instructions, a new
module, hipe_x86_subst, was introduced to the x86 backend.
Due to differences in the 'jtab' field of a #jmp_switch{} between x86
and amd64, it regrettably needed to be duplicated to hipe_amd64_subst.
|
|
|
|
hipe_regalloc_prepass speeds up register allocation by spilling any temp
that is live over a call (which clobbers all register).
In order to detect these, a new function was added to the target
interface; defines_all_alloc/1, that takes an instruction and returns a
boolean.
|
|
These options would not do anything, because they would not supress the
'o2' in ?COMPILE_DEFAULTS. Such behaviour is added to expand_options/2.
|
|
Now that x86 is no longer broken with these optimisation levels, we add
them to the test suite to ensure they do not break again.
Bump timeout to 6min since tests are run twice as many times.
The option set of o1 was changed to all optimisations that run fast on
both big and small programs, incurring only a slight compile time
increase compared to the old set, but with a, presumably, significant
improvement to speed of compiled code.
Change o0 register allocator to linear_scan.
|
|
There is little point offering LSRA for x86 if we're still going to call
hipe_graph_coloring_regalloc for the floats. In particular, all
allocators except LSRA allocates an N^2 interference matrix, making them
unusable for really large functions.
|
|
|
|
This speeds up parentsOfChild/2 from O(n) to O(lg n + k).
A new module misc/hipe_segment_trees.erl is introduced.
|
|
* Rewrite matching statements in ?when_option macro to form that silences
dialyzer's unmatched_return warnings
* Treat compiler warnings as errors when compiling files in main
|
|
|
|
|
|
|
|
* Implemented removal of maps:is_key/2 calls of which the result is
known in a new pass during the typed phase, called
hipe_icode_call_elim.
* Added the option icode_call_elim that enables the
hipe_icode_call_elim pass, and made it default for o2.
|
|
A bug in LLVM miscompiles x86 functions that have floats are spilled to
stack. We work around it by disabling (inlined) floats when using llvm
on x86. Once a LLVM version in which the bug is fixed is released, we
can make the workaround conditional depending on the version.
|
|
|
|
|
|
The bulk of the changes concerns cleanups and code refactorings concerning
record constructions that assigned 'undefined' to record fields whose type
did not contain this value. See commit 8ce35b2.
While at it, some new type definitions were introduced and type names were
used instead of record type notation. Minor code cleaups were also done.
|
|
Main problem:
A faulty HIPE_LITERAL_CRC was not detected by the loader.
Strangeness #1:
Dialyzer should ask the hipe compiler about the target checksum,
not an internal bif.
Strangeness #2:
The HIPE_SYSTEM_CRC checksum was based on the HIPE_LITERALS_CRC
checksum.
Solution:
New HIPE_ERTS_CHECKSUM which is an bxor of the two (now independent)
HIPE_LITERALS_CRC and HIPE_SYSTEM_CRC.
HIPE_LITERALS_CRC represents values that are assumed to stay constant
for different VM configurations of the same arch, and are therefor
hard coded into the hipe compiler.
HIPE_SYSTEM_CRC represents values that may differ between VM variants.
By default the hipe compiler asks the running VM for this checksum,
in order to create beam files for the same running VM.
The hipe compiler can be configured (with "make XCOMP=yes ...") to
create beam files for another VM variant, in which case HIPE_SYSTEM_CRC
is also hard coded.
ToDo:
Treat all erts properties the same. Either ask the running VM or hard
coded into hipe (if XCOMP=yes). This will simplify and reduce the risk
of dangerous mismatches. One concern might be the added overhead
from more frequent calls to hipe_bifs:get_rts_param.
|
|
Instead ask running VM for the value of THE_NON_VALUE,
which is different between opt and debug VM.
Same hipe compiler can now compile for both opt and debug VM.
|
|
OTP-12845
* bruce/change-license:
fix errors caused by changed line numbers
Change license text to APLv2
|
|
|
|
A recent rewrite of some code in this file (commit 355f4b5)
exposed some dialyzer warnings of some code which is unreachable.
Indeed, checking whether one executes on an unsupported architecture
when expanding the 'o2' and 'o3' hipe compiler options is unnecessary
because that check is performed in the expansion of the 'o1' option
anyway. While at it, simplified the code a bit not to have a very
long case clause.
|
|
|