aboutsummaryrefslogtreecommitdiffstats
path: root/lib/compiler/src/beam_ssa_pre_codegen.erl
AgeCommit message (Collapse)Author
2019-07-10compiler: Fix compiler crash introduced by OTP-15952John Högberg
An assertion in code generation would fail when the common exit block was ?BADARG_BLOCK, as some operations expect to always "fail" directly to that block (= throw an exception) and we had inserted a dummy block in between. Other operations could also get funny fail labels, jumping to blocks that immediately jumped to {f,0}, but these were all cleaned up by beam_jump, sweeping the bug under the rug.
2019-07-09compiler: Fix broken 'receive' in try/catch blocksJohn Högberg
This fix is rather ugly and tacked-on, but I'm not comfortable refactoring the pass in an emergency patch.
2019-05-21Eliminate compiler crash when compiling complex receive statementsBjörn Gustavsson
Certain complex receive statements would result in an internal compiler failure. That would happen when the compiler would fail to find the common exit block following a receive. See the added test case for an example. https://bugs.erlang.org/browse/ERL-950
2019-04-29compiler: Propagate match context position on fail pathJohn Högberg
2019-02-19Do the destructive setelement optimization in SSABjörn Gustavsson
The expansion of record field updates, when more than one field is updated, but not a majority of the fields, will create a sequence of calls to `erlang:setelement(Index, Value, Tuple)` where Tuple in the first call is the original record tuple, and in the subsequent calls Tuple is the result of the previous call. Furthermore, all Index values are constant positive integers, and the first call to `setelement` will have the greatest index. Thus all the following calls do not actually need to test at run-time whether Tuple has type tuple, nor that the index is within the tuple bounds. Since OTP R7, the `sys_core_dsetel` pass, run as the very last Core Erlang pass, has optimized this sequence of `setelement` calls to use a special destructive version of `setelement` (called `set_tuple_element`) for all but the very first `setelement` in the sequence. It turns out that the presence of the `set_tuple_element` in SSA code is awkward and can prevent or complicate type analysis and aggressive optimizations. Therefore, this commit removes the `sys_core_dsetel` pass and reimplements it for SSA code. The optimization will be done in the `beam_ssa_pre_codegen` pass (that is, just before code generation and after running all other SSA code optimization passes). In most cases, the resulting BEAM code is identical to previous code. For a few modules, the BEAM code is actually slightly better, with smaller stack frames.
2019-02-07Merge pull request #2135 from bjorng/bjorn/compiler/optimize-dominatorsBjörn Gustavsson
Optimize ssa_opt_sink for huge functions
2019-02-06Optimize ssa_opt_sink for huge functionsBjörn Gustavsson
The ssa_opt_sink optimization of beam_ssa_opt could get very slow for certain huge functions. 9a190cae9bd7 partly addressed this issue by terminating the optimization early if there happened to be no get_tuple_element instructions at all in the function. This commit addresses the issue more directly by making the dominator calculation in beam_ssa:dominators/1 more efficient. The same algorithm as before is used, but it is implemented in a more efficient way based on the ideas in "A Simple, Fast Dominance Algorithm" (http://www.hipersoft.rice.edu/grads/publications/dom14.pdf). As well as being more efficient, the new implementation also gives an explicit representation of the dominator tree, which makes it possible to simplify and optimize the ssa_opt_sink optimization.
2019-02-05beam_ssa_type: Track the types of tuple elementsJohn Högberg
Prior to 294d66a295f6c2101fe3c2da630979ad4e736c08 there wasn't much point to keeping track of tuple element types; they were only known when we had inserted or extracted values from a tuple, and in neither case was it likely that we'd extract the same values again. It makes a lot more sense to do so now that type optimizations are applied across functions; if we return a tuple it's very likely that its elements will be extracted soon after, and knowing their types lets us eliminate more type checks. Co-authored-by: Björn Gustavsson <[email protected]>
2019-02-01Prefer map syntax and guard BIFs over the maps modulesBjörn Gustavsson
Avoiding calls usually reduces the size of the stack frame and reduces register shuffling.
2019-01-25Merge pull request #2106 from bjorng/bjorn/compiler/fewer-movesBjörn Gustavsson
Reduce redundant moves and register shuffling
2019-01-24Merge pull request #2100 from jhogberg/john/compiler/module-type-optimizationJohn Högberg
Apply type optimizations across local function calls
2019-01-24compiler: Introduce module-level type optimizationJohn Högberg
This commit lets the type optimization pass work across functions, tracking return and argument types to eliminate redundant tests.
2019-01-24Reduce redundant moves and register shufflingBjörn Gustavsson
Consider this function and its corresponding BEAM code: foo(Map, Key) -> Val = case Map of #{Key:=Val0} -> Val0; _ -> default end, bar(1, 2, Val). {label,2}. {test,is_map,{f,3},[{x,0}]}. {get_map_elements,{f,3},{x,0},{list,[{x,1},{x,0}]}}. ^^^^^ {jump,{f,4}}. {label,3}. {move,{atom,default},{x,0}}. ^^^^^ {label,4}. {move,{integer,2},{x,1}}. {move,{x,0},{x,2}}. ^^^^^ {move,{integer,1},{x,0}}. {call_only,3,{f,6}}. Note that the value of the variable `Val` will first be placed in `{x,0}` and then moved to `{x,2}` where it needs to be when calling the `bar/3` function. The reason for the extra `move` instruction is that the register allocator picks the lowest numbered available register when choosing a register to put a variable in. In this case, `{x,0}` will be chosen. If we only could give a hint to the register allocator that it would be better to put `Val` in `{x,2}`, the extra `move` would disappear: {label,2}. {test,is_map,{f,3},[{x,0}]}. {get_map_elements,{f,3},{x,0},{list,[{x,1},{x,2}]}}. {jump,{f,4}}. {label,3}. {move,{atom,default},{x,2}}. {label,4}. {move,{integer,2},{x,1}}. {move,{integer,1},{x,0}}. {call_only,3,{f,6}}. There already is an existing sub pass (`reserve_regs`) in `beam_ssa_pre_codegen` that among things tries to give the register allocator hints that some variables should be placed in specific registers, if possible. However, the existing hinting mechanism is limited, essentially only working within a single SSA block. This commit extends the hinting mechanism, allowing hints to be passed across SSA blocks, eliminating `move` instructions and register shuffling in many places. (494 modules out of a sample of 1236 modules were changed by this commit.)
2019-01-23beam_ssa_pre_codegen: Use lists:splitwith/2 for separating phi nodesBjörn Gustavsson
Use lists:splitwith/2 instead of lists:partition/2 for splitting out phi nodes. Since phi nodes are always the first instructions in a block, the result will be the same, but splitwith/2 is faster.
2019-01-11beam_ssa_pre_codegen: Don't use z registers for 'xor' and 'is_record'Björn Gustavsson
There is no easy way to convert xor or is_record/2 to test operations.
2019-01-11beam_ssa_pre_codegen: Correct short-lived optimizationBjörn Gustavsson
2018-12-05beam_ssa_pre_codegen: Fix an internal consistency failureBjörn Gustavsson
The following function: is_two_tuple(Arg) -> case is_tuple(Arg) of false -> false; true -> tuple_size(Arg) == 2 end. would cause an internal consistency failure: Internal consistency check failed - please report this bug. Instruction: {bif,tuple_size,{f,0},[{x,0}],{z,0}} Error: {invalid_store,{z,0},{integer,[]}}:
2018-11-19beam_ssa_pre_codegen: Improve reuse of Y registersBjörn Gustavsson
Enhance the copy_retval/1 optimization to allow Y registers to be reused in more circumstances. Reusing Y register can often reduce the size of the stack frame.
2018-11-15beam_ssa_pre_codegen: Eliminate a bottleneck in linear scanBjörn Gustavsson
The linear scan algorithm keeps unhandled intervals in two separate lists: one for intervals with reserved registers and one for intervals without reserved registers. When collecting the available free registers all unhandled intervals with reserved registers must be checked for overlap. Unhandled intervals that had a preferred register were put into the list of intervals with reserved registers, leading to a lot of unecessary overlap checking if there were many intervals with preferred registers. Changing the partition code to put intervals with preferred registers into the general list of unhandled intervals will reduce the compilation time if there are many preferred registers. On my computer, this change reduced the time of the linear scan pass from about 20 seconds down to about 0.5 seconds for this module: https://github.com/aggelgian/cuter/blob/master/src/cuter_binlib.erl Noticed-by: Kostis Sagonas
2018-10-05beam_ssa_pre_codegen: Literal funs need stack frames tooJohn Högberg
Fixes a crash during code generation of the following code: call_atom() -> fun({send = Send}) -> Send() end.
2018-10-01Merge branch 'bjorn/compiler/fix-r21-option'Björn Gustavsson
* bjorn/compiler/fix-r21-option: Fix code generation of binary instructions with the r21 option
2018-10-01Merge pull request #1965 from bjorng/bjorn/compiler/misc-cleanupsBjörn Gustavsson
Minor cleanups and bug fixes of the compiler
2018-09-28Fix code generation of binary instructions with the r21 optionBjörn Gustavsson
OTP 22 extends the binary instructions to support a Y register destination. When giving an option to compile for an earlier release, make sure that binary instructions don't use a Y register destination, by rewriting the binary instructions to use an X register destination and adding a `move` instruction to move the value to the Y register.
2018-09-28Merge pull request #1958 from jhogberg/john/compiler/ssa-bsm-optJohn Högberg
Rewrite BSM optimizations in the new SSA-based intermediate format
2018-09-28beam_ssa_pre_codegen: Remove unused variable aliasing supportBjörn Gustavsson
Remove the variable aliasing support that was needed for the old beam_bsm pass.
2018-09-28Rewrite BSM optimizations in the new SSA-based intermediate formatJohn Högberg
This commit improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations. Some highlights: * Watch contexts can be reused even after being passed to a function or being used in a try block. * Sub-binaries are no longer eagerly extracted, making it far easier to keep "happy paths" free from binary creation. * Trivial wrapper functions no longer disable context reuse.
2018-09-26beam_ssa_pre_codegen: Fix bug in sanitization of get_map_elementBjörn Gustavsson
The source map for `get_map_element` is not supposed to be a literal, and the sanitization code is supposed to ensure that. The sanitization incorrectly translated this code: Map = get_tuple_element literal {ok,#{key=>value}}, literal 1 Var = get_map_element Map, literal {a,key} To this code: Var = get_map_element literal #{key=>value}, literal {a,key} Make sure to substitute the arguments for `get_map_element` before looking for a literal source map.
2018-09-24beam_ssa_pre_codegen: Correct some commentsBjörn Gustavsson
2018-09-20Consistently use #b_var{} instead of var_name()John Högberg
We chose to refer to variables through their var_name() because we anticipated the need to annotate them, but it turned out we didn't really need that, and many things become a lot cleaner if the entire #b_var{} is used to represent variables.
2018-09-12beam_ssa_pre_codegen: Fix bug in receive fixingBjörn Gustavsson
When creating a phi node for the common exit block of a receive, the code failed to take into account that there could be more than one predecessor to the exit block for each remove_message. Rename exit_predessor/3 to exit_predessors/3 and make it return a list of the predecessors.
2018-09-03Introduce a put_tuple2 instructionBjörn Gustavsson
Sometimes when building a tuple, there is no way to avoid an extra `move` instruction. Consider this code: make_tuple(A) -> {ok,A}. The corresponding BEAM code looks like this: {test_heap,3,1}. {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. {move,{x,1},{x,0}}. return. To avoid overwriting the source register `{x,0}`, a `move` instruction is necessary. The problem doesn't exist when building a list: %% build_list(A) -> [A]. {test_heap,2,1}. {put_list,{x,0},nil,{x,0}}. return. Introduce a new `put_tuple2` instruction that builds a tuple in a single instruction, so that the `move` instruction can be eliminated: %% make_tuple(A) -> {ok,A}. {test_heap,3,1}. {put_tuple2,{x,0},{list,[{atom,ok},{x,0}]}}. return. Note that the BEAM loader already combines `put_tuple` and `put` instructions into an internal instruction similar to `put_tuple2`. Therefore the introduction of the new instruction will not speed up execution of tuple building itself, but it will be less work for the loader to load the new instruction.
2018-08-24Introduce a new SSA-based intermediate formatBjörn Gustavsson
v3_codegen is replaced by three new passes: * beam_kernel_to_ssa which translates the Kernel Erlang format to a new SSA-based intermediate format. * beam_ssa_pre_codegen which prepares the SSA-based format for code generation, including register allocation. Registers are allocated using the linear scan algorithm. * beam_ssa_codegen which generates BEAM assembly code from the SSA-based format. It easier and more effective to optimize the SSA-based format before X and Y registers have been assigned. The current optimization passes constantly have to make sure no "holes" in the X register assignments are created (that is, that no X register becomes undefined that an allocation instruction depends on). This commit also introduces the following optimizations: * Replacing of tuple matching of records with the is_tagged_tuple instruction. (Replacing beam_record.) * Sinking of get_tuple_element instructions to just before the first use of the extracted values. As well as potentially avoiding extracting tuple elements when they are not actually used on all executions paths, this optimization could also reduce the number values that will need to be stored in Y registers. (Similar to beam_reorder, but more effective.) * Live optimizations, removing the definition of a variable that is not subsequently used (provided that the operation has no side effects), as well strength reduction of binary matching by replacing the extraction of value from a binary with a skip instruction. (Used to be done by beam_block, beam_utils, and v3_codegen.) * Removal of redundant bs_restore2 instructions. (Formerly done by beam_bs.) * Type-based optimizations across branches. More effective than the old beam_type pass that only did type-based optimizations in basic blocks. * Optimization of floating point instructions. (Formerly done by beam_type.) * Optimization of receive statements to introduce recv_mark and recv_set instructions. More effective with far fewer restrictions on what instructions are allowed between creating the reference and entering the receive statement. * Common subexpression elimination. (Formerly done by beam_block.)