aboutsummaryrefslogtreecommitdiffstats
path: root/lib/hipe
AgeCommit message (Collapse)Author
2016-09-02hipe: Add hipe_regalloc_prepassMagnus Lång
hipe_regalloc_prepass speeds up register allocation by spilling any temp that is live over a call (which clobbers all register). In order to detect these, a new function was added to the target interface; defines_all_alloc/1, that takes an instruction and returns a boolean.
2016-09-02Merge branch 'sverker/hipe-performance-o1/PR-1154/OTP-13862'Sverker Eriksson
* sverker/hipe-performance-o1/PR-1154: hipe_sparc: Minimise CFG<->linear conversions hipe_ppc: Minimise CFG<->linear conversions hipe_arm: Minimise CFG<->linear conversions hipe_x86: Use lea instead of move+add hipe_arm: Improve peephole optimiser hipe_arm: Be resilient to crappy RTL hipe_ppc: Be resilient to crappy RTL hipe_sparc: Be resilient to crappy RTL hipe: Reuse liveness info for spillmin hipe_x86: Minimise CFG<->linear conversions hipe: Fix o0 and o1 hipe: Add o0 and o1 to tests hipe_rtl_binary:get_word_integer/4: Handle imms hipe_x86: Be resilient to crappy RTL hipe_x86: LSRA for SSE2
2016-09-02Merge branch 'maint'Sverker Eriksson
2016-09-02Merge branch 'sverker/hipe-sparc-19/PR-1148/OTP-13861' into maintSverker Eriksson
* sverker/hipe-sparc-19/PR-1148: Eliminate catch-all clause from two functions Increase the time limit used by the test suite
2016-09-01Merge branch 'maint'Hans Bolinder
* maint: dialyzer: Increase time limit of suites dialyzer: Remove a check that always fails dialyzer: Optimize an opaque type case
2016-08-31dialyzer: Optimize an opaque type caseHans Bolinder
Fix a mistake in commit 85f6fe3b. Instead of using the declared opaque type, the form's type is used in a case where the opaque type is turned into a non-opaque type. The result is more general types (smaller Erlang terms) and faster analyses.
2016-08-30hipe_sparc: Minimise CFG<->linear conversionsMagnus Lång
Now, there will only ever be a single Linear->CFG conversion, just after lowering from RTL, and only ever a single CFG->Linear conversion, just before the finalise pass. Both of these now happen in hipe_sparc_main.
2016-08-30hipe_ppc: Minimise CFG<->linear conversionsMagnus Lång
Now, there will only ever be a single Linear->CFG conversion, just after lowering from RTL, and only ever a single CFG->Linear conversion, just before the finalise pass. Both of these now happen in hipe_ppc_main.
2016-08-30hipe_arm: Minimise CFG<->linear conversionsMagnus Lång
Now, there will only ever be a single Linear->CFG conversion, just after lowering from RTL, and only ever a single CFG->Linear conversion, just before the finalise pass. Both of these now happen in hipe_arm_main.
2016-08-30hipe_x86: Use lea instead of move+addMagnus Lång
This is primarily useful for heap allocations, as a two-address 'add' can't be used to both copy the heap pointer to another register, and add the tag.
2016-08-30hipe_arm: Improve peephole optimiserMagnus Lång
2016-08-30hipe_arm: Be resilient to crappy RTLMagnus Lång
The ARM backend crashes if certain RTL optimisations were omitted, preventing it from being usable at lower optimisation levels. One of the problems were caused by shift-by-immediate-zero, which wraps to immediate-32 with some shiftops. TODO: Someplace should be modified to crash when these are generated so debuging further instances of this gets easier in the future.
2016-08-30hipe_ppc: Be resilient to crappy RTLMagnus Lång
The PowerPC backend crashes if certain RTL optimisations were omitted, preventing it from being usable at lower optimisation levels.
2016-08-30hipe_sparc: Be resilient to crappy RTLMagnus Lång
The SPARC backend crashes if certain RTL optimisations were omitted, preventing it from being usable at lower optimisation levels.
2016-08-30hipe: Reuse liveness info for spillminMagnus Lång
For x86, additionally reuse liveness from float LSRA for the GP LSRA.
2016-08-30hipe_x86: Minimise CFG<->linear conversionsMagnus Lång
Most x86 passes were either linearise(pass(to_cfg(Code))) or trivially rewritable to process a CFG. This saves a great deal of time and memory churn when compiling large programs. Now, there will only ever be a single Linear->CFG conversion, just after lowering from RTL, and only ever a single CFG->Linear conversion, just before the finalise pass. Both of these now happen in hipe_x86_main.
2016-08-30hipe: Fix o0 and o1Magnus Lång
These options would not do anything, because they would not supress the 'o2' in ?COMPILE_DEFAULTS. Such behaviour is added to expand_options/2.
2016-08-30hipe: Add o0 and o1 to testsMagnus Lång
Now that x86 is no longer broken with these optimisation levels, we add them to the test suite to ensure they do not break again. Bump timeout to 6min since tests are run twice as many times. The option set of o1 was changed to all optimisations that run fast on both big and small programs, incurring only a slight compile time increase compared to the old set, but with a, presumably, significant improvement to speed of compiled code. Change o0 register allocator to linear_scan.
2016-08-30hipe_rtl_binary:get_word_integer/4: Handle immsMagnus Lång
Immediate arguments to get_word_integer/4 would lead to bad but unreachable RTL being generated. We omit its generation by testing for immediates and performing the logic at compile time.
2016-08-30hipe_x86: Be resilient to crappy RTLMagnus Lång
The x86 backend crashes if certain RTL optimisations were omitted, preventing it from being usable at lower optimisation levels.
2016-08-30hipe_x86: LSRA for SSE2Magnus Lång
There is little point offering LSRA for x86 if we're still going to call hipe_graph_coloring_regalloc for the floats. In particular, all allocators except LSRA allocates an N^2 interference matrix, making them unusable for really large functions.
2016-08-26Merge branch 'maint'Sverker Eriksson
2016-08-26Eliminate catch-all clause from two functionsKostis Sagonas
A stronger version of Dialyzer complained that some case clauses in functions xaluop_is_shift/1 and xaluop_normalise/1 are unreachable. These clauses are now commented out. While at it, I thought that it would be better to eliminate the catch-all clauses in order to be certain we properly handle all RTL instructions that are used as inputs to these functions. Note: The code will now crash if there are unhandled cases.
2016-08-25Increase the time limit used by the test suiteKostis Sagonas
This is required in some really old SPARC machines running Solaris we still have access to.
2016-08-22hipe: Fix amd64 SSE2 encoding crashMagnus Lång
Register allocation could transform something like fmove u32, d99 to fmove $rdx, 0x20($rsp) which is an invalid instruction.
2016-08-22hipe: Fix tailcall stackarg clobber bugMagnus Lång
Since the link register/return address is restored before stack arguments are stored to the frame, we must not use it to store a stack argument. We do that by adding it to the registers clobbered by pseudo_tailcall_prepare.
2016-08-22hipe_arm: Fix translation of shift by 0Magnus Lång
The problem was caused by shift-by-immediate-zero, which wraps to immediate-32 with some shiftops. TODO: Someplace should be modified to crash when these are generated so debugging further instances of this gets easier in the future.
2016-08-22hipe_ppc: Fix PPC64 bug encoding large immediatesMagnus Lång
2016-08-22hipe_ppc: Fix incorrect encoding of shift by 0Magnus Lång
2016-08-22hipe_x86: Fix illegal inst from peephole optMagnus Lång
2016-07-11hipe_vectors: Change implementation to 'array'Magnus Lång
The 'array' module is highly optimised for the hipe_vectors use-case, and seems to perform slightly better than the gb_trees implementation. Also, we remove the completely unnecessary hipe_vectors.hrl header.
2016-07-11hipe: Faster unreachable basic block removalMagnus Lång
2016-07-11hipe_ssa_liveness: Use map as liveness ADTMagnus Lång
2016-07-11hipe/flow/liveness.inc: Use map for liveness typeMagnus Lång
Slightly improves performance.
2016-07-11hipe_x86_frame: speed up find_tempsMagnus Lång
2016-07-11hipe_icode_range: Use maps over gb_trees,setsMagnus Lång
Also, remove unused field 'counter' from #state{}.
2016-07-11hipe_icode_coordinator: Rewrite concurrentlyMagnus Lång
2016-07-11hipe: segment tree delete operationMagnus Lång
Profiling showed that hipe_sdi spent most of its time in updateParents, discarding nodes that were already deleted. By introducing a delete operation to the segment trees, we can pay this cost only once, when deleting the node from the graph. Instead of keeping the ranges around, we recompute the range of the node when we delete it, since this can be done in constant time, without any memory allocation. Although segment trees are not designed to be modified once built, implementing a delete operation turned out to be a simple matter of repeating insertion, but deleting the index from, instead of consing it on, the appropriate nodes' values (segment lists). This optimisation drastically sped up hipe_sdi to the point of no longer being the bottleneck in the Assembly stage.
2016-07-11hipe_sdi: Use segment trees to represent PARENTSMagnus Lång
This speeds up parentsOfChild/2 from O(n) to O(lg n + k). A new module misc/hipe_segment_trees.erl is introduced.
2016-07-11hipe_icode_{bincomp,range}: Improve complexityMagnus Lång
hipe_icode_bincomp:find_bs_get_integer/3 was quadratic for no good reason. By observing that NewSuccs and Rest are always disjoint, we can see that the worklist does not need to be a set. Furthermore, by replacing the ordset Visited with a map, we reduce complexity to (a very low) O(n lg n). On cuter_binlib, this change reduced the time for hipe_icode_bincomp from 60s to .25s. Using a gb_set for Visited gives .5s, and a sets:set 1s. We apply the same optimisation to hipe_icode_range.
2016-06-28erl_types: Normalise X:=none() pairs in t_map/3Magnus Lång
t_map/3 previously required callers to perform this normalisation, but as t_from_form/5 would sometimes fail to do so, this requirement is relaxed. Bug (ERL-177) reported and shrunk by Luke Imhoff.
2016-06-21Prepare releaseErlang/OTP
2016-06-15hipe: Pattern matching compilation of binaries and bistringsKostis Sagonas
The Core Erlang pattern matching compiler was written long ago, at a time when binaries and bistrings did not really exist in Erlang. This patch, taken from the code of CutEr where it's used for more than a year now, extends the transformation for pattern matching compilation to also include binaries and bistrings. Some code that was found erroneous and causes errors when compiling the transformed code to native code was also taken out while at it. Thanks to @aggelgian for most of the changes in the code.
2016-06-10Merge branch 'hasse/dialyzer/improve_from_form/OTP-13547'Hans Bolinder
* hasse/dialyzer/improve_from_form/OTP-13547: Update primary bootstrap stdlib: Correct types and specs dialyzer: Minor adjustments dialyzer: Suppress unmatched_return for send/2 dialyzer: Improve the translation of forms to types dialyzer: Use a cache when translating forms to types dialyzer: Prepare erl_types:t_from_form() for a cache dialyzer: Optimize erl_types:t_form_form() dialyzer: Correct types syntax_tools: Correct types erts: Correct character repr in doc of the abstract format stdlib: Correct types and specs
2016-06-09Remove support for '...' in Maps typesHans Bolinder
It is possible that '...' is added later (OTP 20.0), but for now we are not sure of all details.
2016-06-09dialyzer: Improve the translation of forms to typesHans Bolinder
Spend less of the limited resources on recursive types.
2016-06-09dialyzer: Use a cache when translating forms to typesHans Bolinder
2016-06-09dialyzer: Prepare erl_types:t_from_form() for a cacheHans Bolinder
No change of functionality.
2016-06-09dialyzer: Optimize erl_types:t_form_form()Hans Bolinder
When the translation from forms to types exceeds some limit, it is faster to try small depths first.
2016-06-09dialyzer: Correct typesHans Bolinder