otp.git - Mirror of Erlang/OTP repository.

Age	Commit message (Collapse)	Author
2019-07-10	compiler: Fix compiler crash introduced by OTP-15952	John Högberg
	An assertion in code generation would fail when the common exit block was ?BADARG_BLOCK, as some operations expect to always "fail" directly to that block (= throw an exception) and we had inserted a dummy block in between. Other operations could also get funny fail labels, jumping to blocks that immediately jumped to {f,0}, but these were all cleaned up by beam_jump, sweeping the bug under the rug.
2019-07-09	compiler: Fix broken 'receive' in try/catch blocks	John Högberg
	This fix is rather ugly and tacked-on, but I'm not comfortable refactoring the pass in an emergency patch.
2019-05-21	Eliminate compiler crash when compiling complex receive statements	Björn Gustavsson
	Certain complex receive statements would result in an internal compiler failure. That would happen when the compiler would fail to find the common exit block following a receive. See the added test case for an example. https://bugs.erlang.org/browse/ERL-950
2019-04-29	compiler: Propagate match context position on fail path	John Högberg

2019-02-19	Do the destructive setelement optimization in SSA	Björn Gustavsson
	The expansion of record field updates, when more than one field is updated, but not a majority of the fields, will create a sequence of calls to `erlang:setelement(Index, Value, Tuple)` where Tuple in the first call is the original record tuple, and in the subsequent calls Tuple is the result of the previous call. Furthermore, all Index values are constant positive integers, and the first call to `setelement` will have the greatest index. Thus all the following calls do not actually need to test at run-time whether Tuple has type tuple, nor that the index is within the tuple bounds. Since OTP R7, the `sys_core_dsetel` pass, run as the very last Core Erlang pass, has optimized this sequence of `setelement` calls to use a special destructive version of `setelement` (called `set_tuple_element`) for all but the very first `setelement` in the sequence. It turns out that the presence of the `set_tuple_element` in SSA code is awkward and can prevent or complicate type analysis and aggressive optimizations. Therefore, this commit removes the `sys_core_dsetel` pass and reimplements it for SSA code. The optimization will be done in the `beam_ssa_pre_codegen` pass (that is, just before code generation and after running all other SSA code optimization passes). In most cases, the resulting BEAM code is identical to previous code. For a few modules, the BEAM code is actually slightly better, with smaller stack frames.
2019-02-07	Merge pull request #2135 from bjorng/bjorn/compiler/optimize-dominators	Björn Gustavsson
	Optimize ssa_opt_sink for huge functions
2019-02-06	Optimize ssa_opt_sink for huge functions	Björn Gustavsson
	The ssa_opt_sink optimization of beam_ssa_opt could get very slow for certain huge functions. 9a190cae9bd7 partly addressed this issue by terminating the optimization early if there happened to be no get_tuple_element instructions at all in the function. This commit addresses the issue more directly by making the dominator calculation in beam_ssa:dominators/1 more efficient. The same algorithm as before is used, but it is implemented in a more efficient way based on the ideas in "A Simple, Fast Dominance Algorithm" (http://www.hipersoft.rice.edu/grads/publications/dom14.pdf). As well as being more efficient, the new implementation also gives an explicit representation of the dominator tree, which makes it possible to simplify and optimize the ssa_opt_sink optimization.
2019-02-05	beam_ssa_type: Track the types of tuple elements	John Högberg
	Prior to 294d66a295f6c2101fe3c2da630979ad4e736c08 there wasn't much point to keeping track of tuple element types; they were only known when we had inserted or extracted values from a tuple, and in neither case was it likely that we'd extract the same values again. It makes a lot more sense to do so now that type optimizations are applied across functions; if we return a tuple it's very likely that its elements will be extracted soon after, and knowing their types lets us eliminate more type checks. Co-authored-by: Björn Gustavsson <[email protected]>
2019-02-01	Prefer map syntax and guard BIFs over the maps modules	Björn Gustavsson
	Avoiding calls usually reduces the size of the stack frame and reduces register shuffling.
2019-01-25	Merge pull request #2106 from bjorng/bjorn/compiler/fewer-moves	Björn Gustavsson
	Reduce redundant moves and register shuffling
2019-01-24	Merge pull request #2100 from jhogberg/john/compiler/module-type-optimization	John Högberg
	Apply type optimizations across local function calls
2019-01-24	compiler: Introduce module-level type optimization	John Högberg
	This commit lets the type optimization pass work across functions, tracking return and argument types to eliminate redundant tests.
2019-01-24	Reduce redundant moves and register shuffling	Björn Gustavsson
	Consider this function and its corresponding BEAM code: foo(Map, Key) -> Val = case Map of #{Key:=Val0} -> Val0; _ -> default end, bar(1, 2, Val). {label,2}. {test,is_map,{f,3},[{x,0}]}. {get_map_elements,{f,3},{x,0},{list,[{x,1},{x,0}]}}. ^^^^^ {jump,{f,4}}. {label,3}. {move,{atom,default},{x,0}}. ^^^^^ {label,4}. {move,{integer,2},{x,1}}. {move,{x,0},{x,2}}. ^^^^^ {move,{integer,1},{x,0}}. {call_only,3,{f,6}}. Note that the value of the variable `Val` will first be placed in `{x,0}` and then moved to `{x,2}` where it needs to be when calling the `bar/3` function. The reason for the extra `move` instruction is that the register allocator picks the lowest numbered available register when choosing a register to put a variable in. In this case, `{x,0}` will be chosen. If we only could give a hint to the register allocator that it would be better to put `Val` in `{x,2}`, the extra `move` would disappear: {label,2}. {test,is_map,{f,3},[{x,0}]}. {get_map_elements,{f,3},{x,0},{list,[{x,1},{x,2}]}}. {jump,{f,4}}. {label,3}. {move,{atom,default},{x,2}}. {label,4}. {move,{integer,2},{x,1}}. {move,{integer,1},{x,0}}. {call_only,3,{f,6}}. There already is an existing sub pass (`reserve_regs`) in `beam_ssa_pre_codegen` that among things tries to give the register allocator hints that some variables should be placed in specific registers, if possible. However, the existing hinting mechanism is limited, essentially only working within a single SSA block. This commit extends the hinting mechanism, allowing hints to be passed across SSA blocks, eliminating `move` instructions and register shuffling in many places. (494 modules out of a sample of 1236 modules were changed by this commit.)
2019-01-23	beam_ssa_pre_codegen: Use lists:splitwith/2 for separating phi nodes	Björn Gustavsson
	Use lists:splitwith/2 instead of lists:partition/2 for splitting out phi nodes. Since phi nodes are always the first instructions in a block, the result will be the same, but splitwith/2 is faster.
2019-01-11	beam_ssa_pre_codegen: Don't use z registers for 'xor' and 'is_record'	Björn Gustavsson
	There is no easy way to convert xor or is_record/2 to test operations.
2019-01-11	beam_ssa_pre_codegen: Correct short-lived optimization	Björn Gustavsson

2018-12-05	beam_ssa_pre_codegen: Fix an internal consistency failure	Björn Gustavsson
	The following function: is_two_tuple(Arg) -> case is_tuple(Arg) of false -> false; true -> tuple_size(Arg) == 2 end. would cause an internal consistency failure: Internal consistency check failed - please report this bug. Instruction: {bif,tuple_size,{f,0},[{x,0}],{z,0}} Error: {invalid_store,{z,0},{integer,[]}}:
2018-11-19	beam_ssa_pre_codegen: Improve reuse of Y registers	Björn Gustavsson
	Enhance the copy_retval/1 optimization to allow Y registers to be reused in more circumstances. Reusing Y register can often reduce the size of the stack frame.
2018-11-15	beam_ssa_pre_codegen: Eliminate a bottleneck in linear scan	Björn Gustavsson
	The linear scan algorithm keeps unhandled intervals in two separate lists: one for intervals with reserved registers and one for intervals without reserved registers. When collecting the available free registers all unhandled intervals with reserved registers must be checked for overlap. Unhandled intervals that had a preferred register were put into the list of intervals with reserved registers, leading to a lot of unecessary overlap checking if there were many intervals with preferred registers. Changing the partition code to put intervals with preferred registers into the general list of unhandled intervals will reduce the compilation time if there are many preferred registers. On my computer, this change reduced the time of the linear scan pass from about 20 seconds down to about 0.5 seconds for this module: https://github.com/aggelgian/cuter/blob/master/src/cuter_binlib.erl Noticed-by: Kostis Sagonas
2018-10-05	beam_ssa_pre_codegen: Literal funs need stack frames too	John Högberg
	Fixes a crash during code generation of the following code: call_atom() -> fun({send = Send}) -> Send() end.
2018-10-01	Merge branch 'bjorn/compiler/fix-r21-option'	Björn Gustavsson
	* bjorn/compiler/fix-r21-option: Fix code generation of binary instructions with the r21 option
2018-10-01	Merge pull request #1965 from bjorng/bjorn/compiler/misc-cleanups	Björn Gustavsson
	Minor cleanups and bug fixes of the compiler
2018-09-28	Fix code generation of binary instructions with the r21 option	Björn Gustavsson
	OTP 22 extends the binary instructions to support a Y register destination. When giving an option to compile for an earlier release, make sure that binary instructions don't use a Y register destination, by rewriting the binary instructions to use an X register destination and adding a `move` instruction to move the value to the Y register.
2018-09-28	Merge pull request #1958 from jhogberg/john/compiler/ssa-bsm-opt	John Högberg
	Rewrite BSM optimizations in the new SSA-based intermediate format
2018-09-28	beam_ssa_pre_codegen: Remove unused variable aliasing support	Björn Gustavsson
	Remove the variable aliasing support that was needed for the old beam_bsm pass.
2018-09-28	Rewrite BSM optimizations in the new SSA-based intermediate format	John Högberg
	This commit improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations. Some highlights: * Watch contexts can be reused even after being passed to a function or being used in a try block. * Sub-binaries are no longer eagerly extracted, making it far easier to keep "happy paths" free from binary creation. * Trivial wrapper functions no longer disable context reuse.
2018-09-26	beam_ssa_pre_codegen: Fix bug in sanitization of get_map_element	Björn Gustavsson
	The source map for `get_map_element` is not supposed to be a literal, and the sanitization code is supposed to ensure that. The sanitization incorrectly translated this code: Map = get_tuple_element literal {ok,#{key=>value}}, literal 1 Var = get_map_element Map, literal {a,key} To this code: Var = get_map_element literal #{key=>value}, literal {a,key} Make sure to substitute the arguments for `get_map_element` before looking for a literal source map.
2018-09-24	beam_ssa_pre_codegen: Correct some comments	Björn Gustavsson

2018-09-20	Consistently use #b_var{} instead of var_name()	John Högberg
	We chose to refer to variables through their var_name() because we anticipated the need to annotate them, but it turned out we didn't really need that, and many things become a lot cleaner if the entire #b_var{} is used to represent variables.
2018-09-12	beam_ssa_pre_codegen: Fix bug in receive fixing	Björn Gustavsson
	When creating a phi node for the common exit block of a receive, the code failed to take into account that there could be more than one predecessor to the exit block for each remove_message. Rename exit_predessor/3 to exit_predessors/3 and make it return a list of the predecessors.
2018-09-03	Introduce a put_tuple2 instruction	Björn Gustavsson
	Sometimes when building a tuple, there is no way to avoid an extra `move` instruction. Consider this code: make_tuple(A) -> {ok,A}. The corresponding BEAM code looks like this: {test_heap,3,1}. {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. {move,{x,1},{x,0}}. return. To avoid overwriting the source register `{x,0}`, a `move` instruction is necessary. The problem doesn't exist when building a list: %% build_list(A) -> [A]. {test_heap,2,1}. {put_list,{x,0},nil,{x,0}}. return. Introduce a new `put_tuple2` instruction that builds a tuple in a single instruction, so that the `move` instruction can be eliminated: %% make_tuple(A) -> {ok,A}. {test_heap,3,1}. {put_tuple2,{x,0},{list,[{atom,ok},{x,0}]}}. return. Note that the BEAM loader already combines `put_tuple` and `put` instructions into an internal instruction similar to `put_tuple2`. Therefore the introduction of the new instruction will not speed up execution of tuple building itself, but it will be less work for the loader to load the new instruction.
2018-08-24	Introduce a new SSA-based intermediate format	Björn Gustavsson
	v3_codegen is replaced by three new passes: * beam_kernel_to_ssa which translates the Kernel Erlang format to a new SSA-based intermediate format. * beam_ssa_pre_codegen which prepares the SSA-based format for code generation, including register allocation. Registers are allocated using the linear scan algorithm. * beam_ssa_codegen which generates BEAM assembly code from the SSA-based format. It easier and more effective to optimize the SSA-based format before X and Y registers have been assigned. The current optimization passes constantly have to make sure no "holes" in the X register assignments are created (that is, that no X register becomes undefined that an allocation instruction depends on). This commit also introduces the following optimizations: * Replacing of tuple matching of records with the is_tagged_tuple instruction. (Replacing beam_record.) * Sinking of get_tuple_element instructions to just before the first use of the extracted values. As well as potentially avoiding extracting tuple elements when they are not actually used on all executions paths, this optimization could also reduce the number values that will need to be stored in Y registers. (Similar to beam_reorder, but more effective.) * Live optimizations, removing the definition of a variable that is not subsequently used (provided that the operation has no side effects), as well strength reduction of binary matching by replacing the extraction of value from a binary with a skip instruction. (Used to be done by beam_block, beam_utils, and v3_codegen.) * Removal of redundant bs_restore2 instructions. (Formerly done by beam_bs.) * Type-based optimizations across branches. More effective than the old beam_type pass that only did type-based optimizations in basic blocks. * Optimization of floating point instructions. (Formerly done by beam_type.) * Optimization of receive statements to introduce recv_mark and recv_set instructions. More effective with far fewer restrictions on what instructions are allowed between creating the reference and entering the receive statement. * Common subexpression elimination. (Formerly done by beam_block.)