Age | Commit message (Collapse) | Author |
|
By changing mask_and_compare from and,sub to sub,and, x86 can use a
3-address LEA immediate add, saving a mov. The RISC backends should see
no change in sequence length.
We make test_(heap_|sub)binary use mask_and_compare so they will benefit
too.
|
|
The addsub sequence was suboptimal when one of the arguments was
immediate, because it became an immediate alu followed by an immediate
alub, and the optimisers would not combine them due to the risk of
altering the branch. However, in this case we know that such a rewrite
is safe, and do it directly in hipe_tagscheme:fixnum_addsub/5 instead.
|
|
This makes the fast case a fallthrough and the slow case a branch,
hopefully improving cache locality.
|
|
|
|
With the introduction of immediate adds encoded as 'LEA' on x86, it is
now possible to do a fixnum add in two instructions and one branch by
commuting the addition and reusing the result register as a temporary,
which makes the 'alub' a 2-address add, saving a move instruction.
|
|
It seems that most 3-address adds of temps can be move coalesced.
Therefore, we limit the behaviour added by 1567585dda8 to only affect
immediate adds.
Also, add conversion of immediate mov+sub to lea.
|
|
Although LEA is useful for three-address form adds, sometimes it is used
where a normal add would have sufficed (due to the addition being the
last use of one of the operands; but RTL lowering does not know that as
it does not have liveness information). As a workaround, we convert LEA
back to ADD when the destination is the same as one of the operands.
|
|
|
|
|
|
|
|
|
|
branch and alub overlap in their use cases, but the backends rely on
knowing that the result is unused in their lowering of branch. By
extending alub so that the destination is optional, it can fully replace
branch.
This simplifies rtl by reducing code duplication and the number of
instructions.
Also, in the x86 and arm backends, we can now use 'test' and
{'tst','mvn','teq'} to lower some alubs without destinations. This is
particularly good for x86, as sequences such as 'is_boxed' type tests
now get shorter (both from not needing a mov to copy the variable, but
also from the fact that 'testb' encodes shorter than 'andq').
|
|
|
|
|
|
# Conflicts:
# lib/hipe/llvm/hipe_rtl_to_llvm.erl
|
|
* margnus1/hipe_llvm39_bugs/PR-1237:
hipe_rtl_to_llvm: Constants for bits per byte/word
hipe_llvm: Work around LLVM 3.9 sdesc bug
hipe_llvm: Fix incorrect atom alignment assumption
|
|
The constant ?WORD_WIDTH is renamed ?BITS_IN_WORD, and a new constant
?BITS_IN_BYTE is introduced.
Additionally, a bug in a currently unused case clause of
llvm_type_from_size/1 is fixed (the size of a word was hardcoded to 64
bits).
|
|
As of LLVM 3.9, the x86-call-frame-opt pass in LLVM's X86 backend causes
the stack descriptors to contain incorrect (or even negative) frame
sizes or root slot offsets.
This might cause LLVM-compiled modules to be rejected during loading
with a badarg exception in hipe_bifs:enter_sdecs/1 (which additionally
prints a "hipe_bifs_enter_sdesc_1: bad sdesc!" message to stderr), or it
might cause corruption or segmentation faults when walking stacks (f.ex.
during GC) containing frames compiled with ErLLVM.
As a workaround, we pass the -no-x86-call-frame-opt flag to llc when
the version is at least 3.9
|
|
ErLLVM was declaring atoms in the following manner:
@atom_ok = external constant i64
; Used inside a function like this
%var = ptrtoint i64* @atom_ok to i64
However, doing so makes LLVM think the `atom_ok` is 8-byte aligned,
since it refers to a i64 value. This resulted in LLVM occasionally
incorrectly optimising away type tests on atoms, causing incorrect
behaviour or even segfaults. One such case is in
bs_match_compiler:coverage_apply/2, in which an is_boxed test on a
literal atom was optimised away, causing the code to try and load the
"header" of an atom. This problem reproduces with LLVM versions 3.7
through 3.9.
By declaring atoms as i8 (byte) constants instead, LLVM no longer makes
these alignment assumptions, and the bug is fixed.
|
|
|
|
|
|
This fixes a HiPE bug reported on erlang-questions on 2/11/2016.
The BEAM to ICode tranaslation of the bs_match_string instruction,
written long ago for binaries (i.e., with byte-sized strings), tried
to do a `clever' translation of even bit-sized strings using a HiPE
primop that took a `Size' argument expressed in *bytes*.
ICode is not really the place to do such a thing, and moreover there
is really no reason for the HiPE primop not to take a Size argument
expressed in *bits* instead. This commit changes the `Size' argument
to be in bits, postpones the translation of the bs_match_string primop
Fixed in a pair-programming/debugging session with @margnus1.
until RTL and does a proper translation using bit-sized quantities there.
|
|
|
|
Seems to work either way, it just seem more correct
as all other 'bif' calls are 'not_remote'.
|
|
|
|
|
|
|
|
|
|
Conflicts:
erts/emulator/beam/beam_bif_load.c
erts/emulator/beam/beam_load.c
and added macro DBG_TRACE_MFA_P in beam_load.h
|
|
to not be transformed to local calls.
code_SUITE:uprade still fails if run with compile option
{hipe,to_llvm}
|
|
Did not work with purge and made worse by new purge strategy.
Did yield terrible performance when fun thing is created *before*
fun code is loaded. Like when receiving not yet loaded fun
from other node. The cached 'native_address' in ErlFunThing
will not be updated leading to mode switch and error_handler
being called for every call to the fun from native code.
|
|
by introducing hipe_bifs:commit_patch_load/1
that creates the HipeModule.
|
|
|
|
Just like the BEAM loader state (as returned by
erlang:prepare_loading/2), the HiPE loader state is contained in a magic
binary.
Eventually, we will separate HiPE loading into a prepare and a finalise
phase, like the BEAM loader, where the prepare phase will be implemented
by hipe_unified_loader and the finalise phase be implemented in C by
hipe_load.c and beam_load.c, making prepare side-effect free and
finalise atomic. The finalise phase will be exposed through the
erlang:finish_loading/1 API, just like the BEAM loader, as this will
allow HiPE and BEAM modules to be mixed in the same atomic "commit".
The usage of a loader state makes it easier to keep track of all
resources allocated during loading, and will not only make it easy to
prevent leaks when hipe_unified_loader crashes, but also paves the way
for proper, leak-free, unloading of HiPE modules.
|
|
* maint:
dialyzer: Fix opaque bug
dialyzer: Fix opaque bugs
|
|
and hipe_bifs:update_code_size
|
|
A step toward better integration of hipe load and purge
Highlights:
* code_server no longer needs to call hipe_unified_loader:post_beam_load/1
Instead new internal function hipe_redirect_to_module()
is called by loading BIFs to patch native call sites if needed.
* hipe_purge_module() is called by erts_internal:purge_module/2
to purge any native code.
* struct hipe_mfa_info redesigned and only used for exported
functions that are called from or implemented by native code.
A list of native call sites (struct hipe_ref) are kept for each hipe_mfa_info.
* struct hipe_sdesc used by hipe_find_mfa_from_ra()
to build native stack traces.
|
|
The "decoration" of opaque types works better than before when opaque
types are used by other opaque types.
|
|
t_from_form() sometimes returned a more general type than it should
have done due to a bug in from_form_loop(): it stopped when the limit
was exceeded, which could mean a collapsed type. Returning a type with
smaller depth should fix this.
is_specialization() now handles opaque types before unions, which
should fix another problem.
The bugs reported by Kostis.
|
|
* rickard/time-unit/OTP-13831:
Replace usage of deprecated time units
|
|
=== OTP-19.1 ===
Changed Applications:
- asn1-4.0.4
- common_test-1.12.3
- compiler-7.0.2
- crypto-3.7.1
- debugger-4.2.1
- dialyzer-3.0.2
- diameter-1.12.1
- edoc-0.8
- erl_docgen-0.6
- erl_interface-3.9.1
- erts-8.1
- eunit-2.3.1
- gs-1.6.2
- hipe-3.15.2
- ic-4.4.2
- inets-6.3.3
- jinterface-1.7.1
- kernel-5.1
- mnesia-4.14.1
- observer-2.2.2
- odbc-2.11.3
- parsetools-2.1.3
- reltool-0.7.2
- runtime_tools-1.10.1
- sasl-3.0.1
- snmp-5.2.4
- ssh-4.3.2
- ssl-8.0.2
- stdlib-3.1
- syntax_tools-2.1
- tools-2.8.6
- wx-1.7.1
- xmerl-1.3.12
Unchanged Applications:
- cosEvent-2.2.1
- cosEventDomain-1.2.1
- cosFileTransfer-1.2.1
- cosNotification-1.2.2
- cosProperty-1.2.1
- cosTime-1.2.2
- cosTransactions-1.3.2
- eldap-1.2.2
- et-1.6
- megaco-3.18.1
- orber-3.8.2
- os_mon-2.4.1
- otp_mibs-1.1.1
- percept-0.9
- public_key-1.2
- typer-0.9.11
Conflicts:
OTP_VERSION
lib/gs/doc/src/notes.xml
lib/gs/vsn.mk
|
|
|
|
* maint:
erl_bif_types: Properly unopaque maps:merge/2 args
|
|
into maint
* margnus1/dialyzer/fix_maps_opaque/ERL-249/PR-1161/OTP-13878:
erl_bif_types: Properly unopaque maps:merge/2 args
|
|
* sverker/hipe-speedy-reg-alloc/PR-1159:
hipe: Refactor ra callbacks to accept context arg
hipe: Reuse liveness between regalloc iterations
hipe: Add ra_partitioned to o1 and up
hipe_regalloc_prepass: Change splitting heuristic
hipe: Make sure prepass temps are below SpillLimit
hipe_regalloc_prepass: Rename coloring collisions
hipe_ppc: Add code rewrite RA callbacks
hipe_sparc: Add code rewrite RA callbacks
hipe_arm: Add code rewrite RA callbacks
hipe_x86: Add code rewrite RA callbacks
hipe: Remove defun_to_cfg/1 RA callback
Add new sanity assertion to hipe_regalloc_prepass
Simplify hipe_x86_ra_finalise:conv_ra_maplet/3
hipe_x86: Simplify ra_postconditions is_mem_opnd
hipe_x86: Fix pseudo_tailcall prettyprinting
hipe_x86: Extra sanity assertions
hipe: clean up unnecessary catches
hipe: Remove temp reuse from call_fun
hipe: Add IG partitioning to hipe_regalloc_prepass
hipe: Add hipe_regalloc_prepass
|
|
erl_bif_types:type/5 was calling erl_types:map_pairwise_merge/3 directly
with its (potentially opaque) arguments, causing Dialyzer crashes.
Bug (ERL-249) reported and minimised test case provided by Felipe
Ripoll.
|
|
|
|
This allows us to pass around the context data that
hipe_regalloc_prepass needs cleanly, without using process dictionary or
parameterised modules (like it was previous to this change).
|
|
This is sound because the liveness data structure only stores liveness
info at basic block boundaries, and the rewrites that happen in
TargetSpecific:check_and_rewrite/2 preserves all existing definitions
and uses, and all new liveness intervals, belonging to newly introduced
temporaries, are always local to a basic block, and thus do not show up
in the liveout or livein sets for the basic block.
|
|
|