aboutsummaryrefslogtreecommitdiffstats
path: root/lib/compiler/src
AgeCommit message (Collapse)Author
2018-02-14beam_block: Combine blocks when running beam_block the second timeBjörn Gustavsson
1a029efd1ad47f started to run the beam_block pass a second time, but it did not attempt to combine adjacent blocks. Combining adjacent blocks leads to many more opportunities for optimizations. After doing some diffing in generated code, it turns out that there is no benefit for beam_split to split out line instructions from blocks. It seems that the only reason it was done was to slightly simplify the implementation of the no_line_info option in beam_clean.
2018-02-14Disable CSE for floating point operationsBjörn Gustavsson
As a preparation for combining blocks before running beam_block for the second time, disable CSE for floating point operations because it will generate invalid code.
2018-02-14Merge branch 'maint'Björn Gustavsson
* maint: Check that the stack is initialized when an exception may occur
2018-02-12Fix unsafe use of 'allocate' where 'allocate_zero' should be usedBjörn Gustavsson
The more aggressive optimizations of 'allocate_zero' introduced in cb6fc15c35c7e could produce unsafe code such as the following: {allocate,0,1}. {bif,element,{f,0},[{integer,1},{x,0}],{x,0}}. The code is not safe because if element/2 fails, the runtime system may scan the stack and find garbage that looks like a catch tag, and would most probably crash. Fix the problem by making beam_utils:is_killed/3 be more conservative when asked whether a Y register will be killed. Also fix an unsafe move upwards of an allocation instruction in beam_block.
2018-02-09Check that the stack is initialized when an exception may occurBjörn Gustavsson
Strengthen beam_validator to check that the stack is initialized when an instruction with an {f,0} operand is executed. For example, the following code sequence: {allocate,0,1}. {bif,element,{f,0},[{integer,1},{x,0}],{x,0}}. should not be accepted because the stack may be scanned if element/2 fails. That could cause a crash or other undefined behavior if garbage on the stack looks like a catch tag.
2018-02-01Merge pull request #1701 from bjorng/bjorn/get_hd_tlBjörn Gustavsson
Eliminate get_list/3 internally in the compiler
2018-01-31Merge pull request #1696 from bjorng/bjorn/compiler/fix-beam_block-bug/ERL-555Björn Gustavsson
Fix incorrect handling of floating point instructions
2018-01-31Merge branch 'maint'Björn Gustavsson
* maint: Fix incorrect type interference of integer ranges Conflicts: lib/compiler/src/beam_type.erl
2018-01-30Fix incorrect handling of floating point instructionsBjörn Gustavsson
1a029efd1ad47f started to run the beam_block pass a second time. Since it is run after introduction of the optimized floating point instructions, it must handle those instructions correctly. In particular, it must be careful when hoisting allocation instructions. For example, the following code: {test_heap,{alloc,[{words,0},{floats,1}]},5}. . . . {fmove,{fr,2},{x,0}}. {allocate_zero,1,4}. must not be rewritten to: {test_heap,{alloc,[{words,0},{floats,1}]},5}. . . . {allocate_zero,1,4}. {fmove,{fr,2},{x,0}}. because beam_validator will not consider it safe. (The code may actually be safe depending on what the code between the two allocation instructions do.) https://bugs.erlang.org/browse/ERL-555
2018-01-29Fix incorrect type interference of integer rangesBjörn Gustavsson
2018-01-26Eliminate get_list/3 internally in the compilerBjörn Gustavsson
Instructions that produce more than one result complicate optimizations. get_list/3 is one of two instructions that produce multiple results (get_map_elements/3 is the other). Introduce the get_hd/2 and get_tl/2 instructions that return the head and tail of a cons cell, respectively, and use it internally in all optimization passes. For efficiency, we still want to use get_list/3 if both head and tail are used, so we will translate matching pairs of get_hd and get_tl back to get_list instructions.
2018-01-26beam_asm: Encode big numbers as literalsBjörn Gustavsson
Numbers that clearly are not smalls can be encoded as literals. Conservatively, we assume that integers whose absolute value is greater than 1 bsl 128 are bignums and that they can be encoded as literals. Literals are slightly easier for the loader to handle than huge integers.
2018-01-26Merge pull request #1691 from bjorng/bjorn/compiler/local-cseBjörn Gustavsson
Do local common sub expression elimination (CSE)
2018-01-25beam_type: Optimize away unnecessary test_unit instructionsBjörn Gustavsson
Optimize away unnecessary test_unit instructions that verify that binaries are byte-aligned. In a tight loop, eliminating an instruction can have a small but measurable improvement of the execution time.
2018-01-25beam_type: Refactor simplify_basic/2 and friendsBjörn Gustavsson
Separate the simplification of instructions from updating of the type data base.
2018-01-24Optimize matching of empty binariesBjörn Gustavsson
Extend an existing optimization in beam_dead to avoid creating a match context when matching an empty binary.
2018-01-24Apply common subexpression elimination in blocksBjörn Gustavsson
Eliminate repeated evaluation of guard BIFs and building of cons cells in blocks. This optimization is applicable in more places than might be expected, because code generation for binaries and record can generate common sub expressions not visible in the original source code. For example, consider this function: make_binary(Term) -> Bin = term_to_binary(Term), Size = byte_size(Bin), <<Size:32,Bin/binary>>. The compiler inserts a call to byte_size/2 to calculate the size of the binary being built: {function, make_binary, 1, 2}. {label,1}. {line,...}. {func_info,{atom,t},{atom,make_binary},1}. {label,2}. {allocate,0,1}. {line,...}. {call_ext,1,{extfunc,erlang,term_to_binary,1}}. {line,...}. {gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}. %Present in original code. {line,...}. {gc_bif,byte_size,{f,0},2,[{x,0}],{x,2}}. %Inserted by compiler. {bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}. {bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}. {bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}. {bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}. {move,{x,2},{x,0}}. {deallocate,0}. return. Common sub expression elimination (CSE) eliminates the second call to byte_size/2: {function, make_binary, 1, 2}. {label,1}. {line,...}. {func_info,{atom,t},{atom,make_binary},1}. {label,2}. {allocate,0,1}. {line,...}. {call_ext,1,{extfunc,erlang,term_to_binary,1}}. {line,...}. {gc_bif,byte_size,{f,0},1,[{x,0}],{x,1}}. {move,{x,1},{x,2}}. {bs_add,{f,0},[{x,2},{integer,4},1],{x,2}}. {bs_init2,{f,0},{x,2},0,2,{field_flags,[]},{x,2}}. {bs_put_integer,{f,0},{integer,32},1,{field_flags,[unsigned,big]},{x,1}}. {bs_put_binary,{f,0},{atom,all},8,{field_flags,[unsigned,big]},{x,0}}. {move,{x,2},{x,0}}. {deallocate,0}. return. Note: A possible future optimization would be to include binary construction instructions in blocks. If that is done, the {move,{x,1},{x,2}} instruction could also be eliminated.
2018-01-24Correct bug in beam_block:opt/2Björn Gustavsson
The folling sequence in a block: {move,{x,1},{x,2}}. {move,{x,2},{x,2}}. would be incorrectly rewritten to: {move,{x,2},{x,2}}. (Which in turn would be optimized away a little bit later.)
2018-01-24Correct unsafe optimizations in beam_blockBjörn Gustavsson
When attempting to eliminate the move/2 instruction in the following code: {bif,self,{f,0},[],{x,0}}. {move,{x,0},{x,1}}. . . . {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. beam_block would produce the following unsafe code: {bif,self,{f,0},[],{x,1}}. . . . {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,1}}. It is unsafe because the tuple is self-referential. The following code: {put_list,{y,6},nil,{x,4}}. {move,{x,4},{x,5}}. {put_list,{y,1},{x,5},{x,5}}. . . . {put_tuple,2,{x,6}}. {put,{x,4}}. {put,{x,5}}. would be incorrectly transformed to: {put_list,{y,6},nil,{x,5}}. {put_list,{y,1},{x,5},{x,5}}. . . . {put_tuple,2,{x,6}}. {put,{x,5}}. {put,{x,5}}. (Both elements in the built tuple get the same value.)
2018-01-23beam_validator: Validate building of tuplesBjörn Gustavsson
Make sure that there is the correct number of put/1 instructions following put_tuple/2. Also make it illegal to reference the register for the tuple being built in a put/1 instruction. That is, beam_validator will now issue a diagnostice for the the following code: {put_tuple,1,{x,0}}. {put,{x,0}}.
2018-01-22Don't build a stacktrace if it's only passed to erlang:raise/3Björn Gustavsson
Consider the following function: function({function,Name,Arity,CLabel,Is0}, Lc0) -> try %% Optimize the code for the function. catch Class:Error:Stack -> io:format("Function: ~w/~w\n", [Name,Arity]), erlang:raise(Class, Error, Stack) end. The stacktrace is retrieved, but it is only used in the call to erlang:raise/3. There is no need to build a stacktrace in this function. We can avoid the building if we introduce an instruction called raw_raise/3 that works exactly like the erlang:raise/3 BIF except that its third argument must be a raw stacktrace.
2018-01-16sys_core_bsm: Rearrange arguments to enable delayed sub binary creationBjörn Gustavsson
Argument order can prevent the delayed sub binary creation. Here is an example directly from the Efficiency Guide: non_opt_eq([H|T1], <<H,T2/binary>>) -> non_opt_eq(T1, T2); non_opt_eq([_|_], <<_,_/binary>>) -> false; non_opt_eq([], <<>>) -> true. When compiling with the bin_opt_info option, there will be a suggestion to change the argument order. It turns out sys_core_bsm can itself change the order, not the order of the arguments of themselves, but the order in which the arguments are matched. Here is how it can be rewritten in pseudo Core Erlang code: non_opt_eq(Arg1, Arg2) -> case < Arg2,Arg1 > of < <<H1,T2/binary>>, [H2|T1] > when H1 =:= H2 -> non_opt_eq(T1, T2); < <<_,T2/binary>ffff>, [_|T1] > -> false; < <<>>, [] >> -> true end. When rewritten like this, the bs_start_match2 instruction will be the first instruction in the function and it will be possible to store the match context in the same register as the binary ({x,1} in this case) and to delay the creation of sub binaries. The switching of matching order also enables many other simplifications in sys_core_bsm, since there is no longer any need to pass the position of the pattern as an argument. We will update the Efficiency Guide in a separate branch before the release of OTP 21.
2018-01-12Merge pull request #1680 from bjorng/bjorn/compiler/beam_blockBjörn Gustavsson
Run beam_block a second time
2018-01-12Merge pull request #1679 from bjorng/bjorn/compiler/sys_core_foldBjörn Gustavsson
Clean up and improve sys_core_fold optimizations
2018-01-11Run beam_block again after other optimizations have been runBjörn Gustavsson
Running beam_block again after the other optimizations have run will give it more opportunities for optimizations. In particular, more allocate_zero/2 instructions can be turned into allocate/2 instructions, and more get_tuple_element/3 instructions can store the retrieved value into the correct register at once. Out of a sample of about 700 modules in OTP, 64 modules were improved by this commit.
2018-01-11beam_bsm: Insert introduced 'move' instructions into blockBjörn Gustavsson
If possible, when adding move/2 instructions, try to insert them into a block. That could potentially allow them to be optimized.
2018-01-11Prepare beam_utils to run again after beam_splitBjörn Gustavsson
beam_utils:live_opt/1 is currently only run early (from beam_block). Prepare it to be run after beam_split when instructions with failure labels have been taken out of blocks. While we are it, also improve check_liveness/3. That will improve the optimizations in beam_record (replacing tuple matching instructions with an is_tagged_tuple instruction).
2018-01-11beam_utils: Correct handling of liveness for select_valBjörn Gustavsson
Since the select_val instruction never transfer directly to the next instruction, the incoming live registers should be ignored. This bug have not caused any problems yet, but it will in the future if we are to run the liveness optimizations again after the optimizations in beam_dead and beam_jump.
2018-01-11beam_block: Reorder element/2 calls in guardsBjörn Gustavsson
In a guard, reorder two consecutive calls to the element/2 BIF that access the same tuple and have the same failure label so that highest index is fetched first. That will allow the second element/2 to be replace with the slightly cheaper get_tuple_element/3 instruction.
2018-01-11Improve code generation for a 'case' with exported variablesBjörn Gustavsson
Consider a 'case' that exports variables and whose return value is ignored: foo(N) -> case N of 1 -> Res = one; 2 -> Res = two end, {ok,Res}. That code will be translated to the following Core Erlang code: 'foo'/1 = fun (_@c0) -> let <_@c5,Res> = case _@c0 of <1> when 'true' -> <'one','one'> <2> when 'true' -> <'two','two'> <_@c3> when 'true' -> primop 'match_fail'({'case_clause',_@c3}) end in {'ok',Res} The exported variables has been rewritten to explicit return values. Note that the original return value from the 'case' is bound to the variable _@c5, which is unused. The corresponding BEAM assembly code looks like this: {function, foo, 1, 2}. {label,1}. {line,[...]}. {func_info,{atom,t},{atom,foo},1}. {label,2}. {test,is_integer,{f,6},[{x,0}]}. {select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}. {label,3}. {move,{atom,two},{x,1}}. {move,{atom,two},{x,0}}. {jump,{f,5}}. {label,4}. {move,{atom,one},{x,1}}. {move,{atom,one},{x,0}}. {label,5}. {test_heap,3,2}. {put_tuple,2,{x,0}}. {put,{atom,ok}}. {put,{x,1}}. return. {label,6}. {line,[...]}. {case_end,{x,0}}. Because of the test_heap instruction following label 5, the assignment to {x,0} cannot be optimized away by the passes that optimize BEAM assembly code. Refactor the optimizations of 'let' in sys_core_fold to eliminate the unused variable. Thus: 'foo'/1 = fun (_@c0) -> let <Res> = case _@c0 of <1> when 'true' -> 'one' <2> when 'true' -> 'two' <_@c3> when 'true' -> primop 'match_fail'({'case_clause',_@c3}) end in {'ok',Res} The resulting BEAM code will look like: {function, foo, 1, 2}. {label,1}. {line,[...]}. {func_info,{atom,t},{atom,foo},1}. {label,2}. {test,is_integer,{f,6},[{x,0}]}. {select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}. {label,3}. {move,{atom,two},{x,0}}. {jump,{f,5}}. {label,4}. {move,{atom,one},{x,0}}. {label,5}. {test_heap,3,1}. {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. {move,{x,1},{x,0}}. return. {label,6}. {line,[...]}. {case_end,{x,0}}.
2018-01-11Remove special cases in optimization of a simple letBjörn Gustavsson
Improve handling of #c_seq{}, making sure to simplify a #c_seq{} as much as possible. With that improvement, we can remove some special-case code from opt_simple_let_2/6.
2018-01-11sys_core_fold: Make it clear what part of Sub is usedBjörn Gustavsson
2018-01-11sys_core_fold: Simplify usage of move_case_into_arg/2Björn Gustavsson
2018-01-11Refactor '%live' and '%def' annotationsBjörn Gustavsson
The annotations in the optimizing passes currently looks like this: {'%live',NumRegistersUsed,RegistersUsedBitmap} {'%def',RegistersDefinedBitmap} (NumRegistersUsed is no longer used.) When I attempted to extend some optimizations, I found that I had to add additional clauses to tolerate/handle both types of annotations. That problem would only get worse if any more annotations are added in the future. To simplify annotation handling, this commit wraps both types of annotations in a {'%anno',_} tuple: {'%anno',{used,RegistersUsedBitmap}} {'%anno',{def,RegistersDefinedBitmap}} The '%live' annotation has been renamed to 'used' to make it somewhat clearer what it means, and the unused NumRegistersUsed part of the old annotation has been removed. Alternatives considered: My first attempt was to wrap the annotation in a 'set' tuple so that there would only be 'set' tuples in a block. For example: {set,[],[],{anno,{live,RegistersUsedBitmap}}} It was not as convenient as expected. Annotations often need to be handled specially from other instructions in a block. When they are wrapped in a 'set' tuple, they can very easily be handled incorrectly or passed on to the next pass. That causes subtle errors or worse code, and it can be difficult to debug. Therefore, my conclusion is that annotations should be distinct from other instructions, to make it obvious when one have missed to handle an annotation.
2018-01-11Merge pull request #1678 from ↵John Högberg
jhogberg/john/compiler/reintroduce-tuple-arity-optimizations/OTP-14857 Reintroduce the tuple arity optimizations removed in PR #1673
2018-01-10beam_block: Improve optimization of allocate_zero instructionsBjörn Gustavsson
Turn more allocate_zero instructions into allocate instructions.
2018-01-10beam_type: Enhance coalescing of allocation instructionsBjörn Gustavsson
An 'allocate' or 'allocate_zero' instruction should not be shortly followed by a 'test_heap' instruction. For example, we don't want this type of code: {allocate_zero,3,4}. {line,...}. {test_heap,7,4}. {bif,element,{f,0},...,...}. While the code is safe because 'allocate_zero' has initialized the stack frame, it is wasteful. Also note that the code would become unsafe if the 'allocate_zero' instruction were to be replaced with an 'allocate' instruction. What we want to see is this: {allocate_heap_zero,3,7,4}. {line,...}. {bif,element,{f,0},...,...}.
2018-01-10Correct beam_utils:combine_heap_needs/2Björn Gustavsson
In 21dd6e55877832, beam_utils:combine_heap_needs/2 stopped wrapping an allocation list in an {alloc,...} tuple. That was not noticed because the faulty heap need created in beam_block was discarded by beam_type.
2018-01-10Correct beam_utils:is_killed/3Björn Gustavsson
beam_utils:is_killed/3 could incorrectly indicate that a register was killed, when in fact it was referenced by an instruction that did a GC.
2018-01-10Merge branch 'maint'Björn Gustavsson
* maint: beam_validator: Strengthen validation of GC instructions
2018-01-10Merge pull request #1674 from bjorng/bjorn/compiler/beam_validatorBjörn Gustavsson
beam_validator: Strengthen validation of GC instructions OTP-14863
2018-01-08Reintroduce the arity optimization removed in OTP-14855John Högberg
We can safely tell when a test_arity or is_record instruction is superflous by keeping track of whether the size is exactly known or not.
2018-01-08Merge branch 'maint'John Högberg
2018-01-08beam_validator: Strengthen validation of GC instructionsBjörn Gustavsson
beam_validator did not verify that the Y registers were initialized before executing the following instructions that could cause a GC: bs_append/8 bs_init2/6 bs_init_bits/6 gc_bif1/5 gc_bif2/6 gc_bif3/7 test_heap/2 That means that, for example, an incorrect optimization that replaced an 'allocate_zero' instruction with an 'allocate' instruction when it was not safe, would not be rejected by beam_validtor, but would instead cause a crash or other undefined behavior at runtime. Also fix a minor bug in beam_type exposed by the stronger checking. When compiling from .S files, beam_type did not handle the init/1 instruction and could produce unsafe code.
2018-01-04Remove unsafe is_record/test_arity optimizationsJohn Högberg
The type optimizations for is_record and test_arity checked whether the arity was equal to the size stored in the type information, which is incorrect since said size is the *minimum* size of the tuple (as determined by previous instructions) and not its exact size. A future patch to the 'master' branch will restore these optimizations in a safe manner.
2017-12-20Reduce register shuffling in receive clausesBjörn Gustavsson
Handle a few more instructions in beam_utils. That will allow beam_reorder to reorder more instructions, delaying get_tuple_element instructions and reducing register shuffling in receive clauses.
2017-12-18v3_codegen: Don't let exit BIFs force a stack frameBjörn Gustavsson
This is an enhancement of the optimization added in 2e5d6201bb044, where we tried to avoid forcing a stack frame for functions that don't really need them. That optimization would not suppress the stack frame for this function: f(A) -> Res = case A of a -> x; b -> y end, {ok,Res}. The reason is that internally the compiler would rewrite the code to something like this: f(A) -> Res = case A of a -> x; b -> y; Other -> error({case_clause,Other}) end, {ok,Res}. The call to error/1 would force creation of a stack frame, even though it is not really needed because error/1 causes an exception. Handle calls to exit BIFs specially to allow suppressing the stack frame.
2017-12-18Merge pull request #1658 from bjorng/bjorn/compiler/delay-stackframeBjörn Gustavsson
Delay creation of stack frames
2017-12-15Merge branch 'bjorn/compiler/coverage'Björn Gustavsson
* bjorn/compiler/coverage: beam_utils: Refactor combine_alloc_lists() to cover more lines map_SUITE: Cover beam_utils:bif_to_test/3 beam_disasm: Remove support for obsolete instructions guard_SUITE: Test is_bitstring/1 and is_map/1 on literals
2017-12-15v3_codegen: Delay creation of stack framesBjörn Gustavsson
v3_codegen currently wraps a stack frame around each clause in a function (unless the clause is simple without any 'case' or other complex constructions). Consider this function: f({a,X}) -> A = abs(X), case A of 0 -> {result,"0"}; _ -> {result,integer_to_list(A)} end; f(_) -> error. The first clause needs a stack frame because there is a function call to integer_to_list/1 not in the tail position. v3_codegen currently wraps the entire first clause in stack frame. We can delay the creation of the stack frame, and create a stack frame in each arm of the 'case' (if needed): f({a,X}) -> A = abs(X), case A of 0 -> %% Don't create a stack frame here. {result,"0"}; _ -> %% Create a stack frame here. {result,integer_to_list(A)} end; f(_) -> error. There are pros and cons of this approach. The cons are that the code size may increase if there are many 'case' clauses and each needs its own stack frame. The allocation instructions may also interfere with other optimizations, but the new optimizations introduced in previous commits will mitigate most of those issues. The pros are the following: * For some clauses in a 'case', there is no need to create any stack frame at all. * Often when moving an allocation instruction into a 'case' clause, the slightly cheaper 'allocate' instruction can be used instead of 'allocate_zero'. There is also the possibility that the allocate instruction can be be combined with a 'test_heap' instruction. * Each stack frame for each arm of the 'case' will have exactly as many slots as needed.