aboutsummaryrefslogtreecommitdiffstats
path: root/lib/compiler/src/v3_codegen.erl
AgeCommit message (Collapse)Author
2017-12-06v3_codegen: Avoid excessive stack frame allocationBjörn Gustavsson
A 'case' or 'if' that does not occur last in a function clause will always force a stack frame. The reasoning behind this is that in most uses of 'case' there will be a function call from within the 'case'. When there is a function call, the stack frame is needed both to save the continuation pointer and to save any X registers that will need to survive the call. When there is no function call from a 'case', the resulting stack frame is annoying. There will be register shuffling, and the existence of the stack frame may thwart many optimizations (for example, in beam_dead). Therefore, add an extra pass to v3_codegen to avoid creating a stack frame when not needed. https://bugs.erlang.org/browse/ERL-514
2017-12-04Clean up and comment code generation for basic blocksBjörn Gustavsson
2017-11-30Stop trying to maximize the use of x(0)Björn Gustavsson
X register 0 used to be mapped to a hardware register, and therefore faster than the other registers. Because of that, the compiler tried to use x(0) as much as possible as a temporary register. That was changed a few releases ago. X register 0 is now placed in the array of all X registers and has no special speed advantage compared to the other registers. Remove the code in the compiler that attempts to use x(0) as much as possible. As a result, the following type of instruction will be much less frequent: {put_list,Src,{x,0},{x,0}} Instead, the following type of instruction will be more frequent: {put_list,Src,{x,X},{x,X}} (Where X is an arbitrary X register.) Update the runtime system to specialize that kind of put_list instruction.
2017-11-30Clean up collection of basic blocksBjörn Gustavsson
2017-11-29v3_codegen: Remove check of operand for bs_context_to_binaryBjörn Gustavsson
The bs_context_to_binary instruction only allows a register operand. v3_codegen has a test to ensure that the operand is a register. That test is no longer necessary. There used to be a possibility that optimizations in sys_core_fold and the inliner could change the operand for bs_context_to_binary to a binary literal. Since 09112806c15a81b that can no longer happen, because no more optimizations are run after the introduction of the bs_context_to_binary instruction.
2017-10-27Eliminate the v3_life passBjörn Gustavsson
The v3_life pass does not do enough to be worth being its own pass. Essentially it does two things: * Calculates life-time information starting from the annotations that v3_kernel provides. That part can be moved into v3_codegen. * Rewrites the Kernel Erlang records to similar plain tuples (for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}). That rewriting is not needed and can be eliminated.
2017-10-21Optimize matching of literals for single-valued typesBjörn Gustavsson
If a type only has one clause and if the pattern is literal, the matching can be done more efficiently by directly comparing with the literal. Example: find(String, "") -> String; find(String, <<>>) -> String; find(String, SearchPattern) -> . . . Without this optimization, the relevant part of the code would look this: {test,bs_start_match2,{f,3},2,[{x,1},0],{x,2}}. {test,bs_test_tail2,{f,4},[{x,2},0]}. return. {label,3}. {test,is_nil,{f,4},[{x,1}]}. return. {label,4}. . . . That is, if {x,1} is a binary, a match context will be built to test whether {x,1} is an empty binary. With the optimization, the code will look this: {test,is_eq_exact,{f,3},[{x,1},{literal,<<>>}]}. return. {label,3}. {test,is_nil,{f,4},[{x,1}]}. return. {label,4}. . . .
2017-10-02Merge pull request #1546 from michalmuskala/bs-add-optBjörn Gustavsson
Optimise size calculation for binary construction OTP-14654
2017-09-08Optimise size calculation for binary constructionMichał Muskała
It turns out it was extremely common to have a following sequence: move 0 D1 byte_size _ D2 bs_add D1 D2 D Which is equivalent to just: byte_size _ D Similarly another sequence: move S D1 byte_size _ D2 bs_add D1 D2 D Can be optimised into: byte_size _ D2 bs_add S D2 D Both of those optimisations work with byte_size and bit_size instructions.
2017-08-31Eliminate unnecessary 'move' instructionsBjörn Gustavsson
The compiler could sometimes emit unnecessary 'move' instructions in the code for binary matching, for example for this function: escape(<<Byte, Rest/bits>>, Pos) when Byte >= 127 -> escape(Rest, Pos + 1); escape(<<Byte, Rest/bits>>, Pos) -> escape(Rest, Pos + Byte); escape(<<_Rest/bits>>, Pos) -> Pos. The generated code would look like this: {function, escape, 2, 2}. {label,1}. {line,[{location,"t.erl",17}]}. {func_info,{atom,t},{atom,escape},2}. {label,2}. {test,bs_start_match2,{f,1},2,[{x,0},0],{x,0}}. {test,bs_get_integer2, {f,4}, 2, [{x,0}, {integer,8}, 1, {field_flags,[{anno,[17,{file,"t.erl"}]},unsigned,big]}], {x,2}}. {'%',{bin_opt,[17,{file,"t.erl"}]}}. {move,{x,0},{x,3}}. %% UNECESSARY! {test,is_ge,{f,3},[{x,2},{integer,127}]}. {line,[{location,"t.erl",18}]}. {gc_bif,'+',{f,0},4,[{x,1},{integer,1}],{x,1}}. {move,{x,3},{x,0}}. %% UNECESSARY! {call_only,2,{f,2}}. {label,3}. {line,[{location,"t.erl",20}]}. {gc_bif,'+',{f,0},4,[{x,1},{x,2}],{x,1}}. {move,{x,3},{x,0}}. %% UNECESSARY! {call_only,2,{f,2}}. {label,4}. {move,{x,1},{x,0}}. return. The redundant 'move' instructions have been marked. To avoid the 'move' instructions, we can extend the existing function is_context_unused/1 in v3_codegen. If v3_codegen can know that the match context will not be used again, it can reuse the register for the match context and avoid the extra 'move' instructions. https://bugs.erlang.org/browse/ERL-444
2017-01-12v3_codegen: Add missing types and specsBjörn Gustavsson
2016-11-18v3_kernel: Generate optimized code for guardsBjörn Gustavsson
The compiler produces poor code for complex guard expressions with andalso/orelse. Here is an example from the filename module: -define(IS_DRIVELETTER(Letter),(((Letter >= $A) andalso (Letter =< $Z)) orelse ((Letter >= $a) andalso (Letter =< $z)))). skip_prefix(Name, false) -> Name; skip_prefix([L, DrvSep|Name], DrvSep) when ?IS_DRIVELETTER(L) -> Name; skip_prefix(Name, _) -> Name. beam_bool fails to simplify the code for the guard, leaving several 'bif' instructions: {function, skip_prefix, 2, 49}. {label,48}. {line,[{location,"filename.erl",187}]}. {func_info,{atom,filename},{atom,skip_prefix},2}. {label,49}. {test,is_ne_exact,{f,52},[{x,1},{atom,false}]}. {test,is_nonempty_list,{f,52},[{x,0}]}. {get_list,{x,0},{x,2},{x,3}}. {test,is_nonempty_list,{f,52},[{x,3}]}. {get_list,{x,3},{x,4},{x,5}}. {bif,'=:=',{f,52},[{x,1},{x,4}],{x,6}}. {test,is_ge,{f,50},[{x,2},{integer,65}]}. {bif,'=<',{f,52},[{x,2},{integer,90}],{x,7}}. {test,is_eq_exact,{f,51},[{x,7},{atom,false}]}. {test,is_ge,{f,50},[{x,2},{integer,97}]}. {bif,'=<',{f,52},[{x,2},{integer,122}],{x,7}}. {jump,{f,51}}. {label,50}. {move,{atom,false},{x,7}}. {label,51}. {bif,'=:=',{f,52},[{x,7},{atom,true}],{x,7}}. {test,is_eq_exact,{f,52},[{x,6},{atom,true}]}. {test,is_eq_exact,{f,52},[{x,7},{atom,true}]}. {move,{x,5},{x,0}}. return. {label,52}. return. We can add optimizations of guard tests to v3_kernel to achive a better result: {function, skip_prefix, 2, 49}. {label,48}. {line,[{location,"filename.erl",187}]}. {func_info,{atom,filename},{atom,skip_prefix},2}. {label,49}. {test,is_ne_exact,{f,51},[{x,1},{atom,false}]}. {test,is_nonempty_list,{f,51},[{x,0}]}. {get_list,{x,0},{x,2},{x,3}}. {test,is_nonempty_list,{f,51},[{x,3}]}. {get_list,{x,3},{x,4},{x,5}}. {test,is_eq_exact,{f,51},[{x,1},{x,4}]}. {test,is_ge,{f,51},[{x,2},{integer,65}]}. {test,is_lt,{f,50},[{integer,90},{x,2}]}. {test,is_ge,{f,51},[{x,2},{integer,97}]}. {test,is_ge,{f,51},[{integer,122},{x,2}]}. {label,50}. {move,{x,5},{x,0}}. return. {label,51}. return. Looking at the STDLIB application, there were 112 lines of BIF calls in guards that beam_bool failed to convert to test instructions. This commit eliminates all those BIF calls. Here is how I counted the instructions: $ PATH=$ERL_TOP/bin:$PATH erlc -I ../include -I ../../kernel/include -S *.erl $ grep "bif,'[=<>]" *.S | grep -v f,0 dets.S: {bif,'=:=',{f,547},[{x,4},{atom,read_write}],{x,4}}. dets.S: {bif,'=:=',{f,547},[{x,5},{atom,saved}],{x,5}}. dets.S: {bif,'=:=',{f,589},[{x,5},{atom,read}],{x,5}}. . . . $ grep "bif,'[=<>]" *.S | grep -v f,0 | wc 112 224 6765 $
2016-09-21Simplify handling of internal BIFsBjörn Gustavsson
Do a simpler translation of internal BIFs. While we are it, also remove the dummy values of Index and Uniq from the make_fun internal operation.
2016-06-02Eliminate crash for map updates in guardsBjörn Gustavsson
beam_validator would complain that x(1) is uninitialized in a test_heap instruction when attempting to compile the following code with sys_core_fold turned off: foo(M) when not (M#{true := 0}); [M] -> ok. Simplified, the generated BEAM assembly code looked like this: test is_map BadMap x(0) put_map_exact Fail x(0) => x(1) ... jump BooleanStuff BadMap: move ok => x(1) jump Fail BooleanStuff: ... move Boolean => x(2) jump Build Fail: move false => x(2) Build: test_heap 2 3 %% x(0), x(1), x(2) must be live. ... That is, if put_map_exact failed, control would transfer to the label Fail without initializing x(1). Fix that by making sure that x(1) is initilized even if put_map_exact fails: test is_map BadMap x(0) put_map_exact BadLbl x(0) => x(1) ... jump OkLbl BadLbl: move ok => x(1) jump Fail OkLbl: jump BooleanStuff BadMap: move ok => x(1) jump Fail BooleanStuff: ... move Boolean => x(2) jump Build Fail: move false => x(2) Build: test_heap 2 3 %% x(0), x(1), x(2) must be live. ... Note that this situation is rare, and that other optimization passes (beam_dead and beam_jump in particular) will clean up this mess.
2016-05-25v3_codegen: Don't confuse beam_validatorBjörn Gustavsson
Generate code that not only is safe, but can easily be seen by beam_validator to be safe.
2016-05-25v3_codegen: Correct code generation for an error/1 call in a guardBjörn Gustavsson
Sometimes v3_codegen would generate unsafe code when there was a call to error/1 in a guard.
2016-03-15update copyright-yearHenrik Nord
2016-01-11Merge branch 'maint'Björn Gustavsson
* maint: Eliminate crash in v3_codegen
2016-01-11Eliminate crash in v3_codegenBjörn Gustavsson
The following code would crash v3_codegen: order(From) -> catch if From#{[] => sufficient} -> saint end. Before explaining the crash, first some background on the stack frame and the Y registers. Certain instructions, most notably the 'call' instructions, clobber all X registers. Before any such instruction, all X registers that have values that will be used after the call must be saved to Y registers (i.e. to the stack frame). adjust_stack/4 will be called when X registers must be saved. There is also another situation when X registers must be saved, namely within a 'catch' if we are about to execute any instruction that may cause an exception. Examples of such instructions are some guard BIFs (such as length/1) and construction of binaries or maps. Within a 'catch', X registers must be be saved because if an exception is thrown and catched all X registers will be destroyed. The same adjust_stack/4 function will be called for those instructions, but only if they occur within a 'catch'. There is actually one more complication. If there is code in a guard within a catch, the X registers should not be saved, because the code in a guard never clobbers any X registers that were alive before the guard code was entered. v3_codegen is written with the implicit assumption that code in guards never cause anything to be saved to Y registers. The code for building maps and binaries would incorrectly save X registers within a guard inside a 'catch'. For construction of binaries, that would mean that a useless but harmelss 'move' instruction was generated. But for construction of maps, the saving of the Y register would not be harmless. There would be a crash when attempting to merge #sr{} records. #sr{} records keeps track of the contents of X and Y registers. When two separate code paths are joined (e.g. at the end of 'case' statement), the register descriptors must be reconciled. Basically, the register descriptors for both paths must be identical. The #sr{} record for one path must not claim that {y,0} contains a certain value, while another path claims that {y,0} is dead. Thus, the crash occurs in sr_merge/2 when failing to reconcile the Y registers. To fix this bug this bug we will introduce a new function called maybe_adjust_stack/5. It will save X registers on the stack only if the code is inside a catch but not inside a guard. We will change all existing code to use this new function when appropriate. Reported-by: Thomas Arts
2015-09-28v3_codegen: Optimize matching of the final size-less binary segmentBjörn Gustavsson
Consider the following function: f(Bin, Bool) -> case Bin of <<Val:16/binary,_/binary>> when Bool -> Val end. Simplified, the generated code looks like: bs_start_match2 Fail Live Bin => Bin bs_get_integer2 Fail Live Bin size=Sz unit=1 => Val bs_skip_bits2 Fail Bin size=all unit=8 is_eq_exact Fail Bool true The code generator will replace the bs_skip_bits2 instruction with a bs_test_unit instruction if it can be clearly seen that the context register will not be used again. In this case, it is not obvious without looking at the code at the Fail label. However, it turns out that bs_test_unit instruction is always safe beacuse of the way v3_kernel compiles pattern matching. It doesn't matter whether the match context will be used again. If it will be used again, the position in it will *not* be used. Instead, a bs_restore2 instruction will restore one of the saved instructions.
2015-06-18Change license text to APLv2Bruce Yinhe
2015-05-21v3_codegen: Use Maps to map local functionsBjörn-Egil Dahlberg
2015-05-21compiler: Use lc instead of map/1 in v3_codegenBjörn-Egil Dahlberg
Small speed increase for large files.
2015-04-29v3_core, v3_codegen: Eliminate old-style catchesBjörn Gustavsson
2015-04-22v3_codegen: Reduce cost for fixing up bs_match_string instructionsBjörn Gustavsson
Commit b76588fb5a introduced an optimization of the compile time of huge functions with many bs_match_string instructions. The optimization is done in two passes. The first pass coalesces adjacent bs_match_string instructions. To avoid copying bitstrings multiple times, the bitstrings in the instructions are combined in to a (deep) list. The second pass goes through all instructions in the function and combines the list of bitstrings to a single bitstring in all bs_match_string instructions. The second pass (fix_bs_match_string) is run on all instructions in each function, even if there are no bs_match_instructions in the function. While fix_bs_match_string is not a bottleneck (it is a linear pass), its execution time is noticeable when profiling some modules. Move the execution of the second pass to the select_binary() function so that it will only be executed for instructions that do binary matching. Also take the opportunity to optimize away uses of bs_restore2 that occour directly after a bs_save2. That optimimization is currently done in beam_block, but it can be done essentially for free in the same pass that fixes up bs_match_string instructions.
2015-04-22v3_codegen: Optimize "turning" of y registersBjörn Gustavsson
Profiling shows that the execution time for "turning" y registers is noticeable for some modules (e.g. S1AP-PDU-Contents from the asn1 test suite). We can reduce the impact on running time by special-casing important instructions. In particular, there is no need to look for y registers in the list argument for a select_val instruction.
2015-04-13v3_codegen: Don't sort map keys in map creation/updateBjörn Gustavsson
It is no longer necessary to sort the keys, since the loader does the sorting.
2015-03-11v3_life: Combine literal/2 and literal2/2Björn Gustavsson
For unclear reasons, there are two functions in v3_life that are almost identical: literal/2 and literal2/2. literal/2 is used for expressions and literal2/2 for patterns. It turns out that literal2/2 can do everything that literal/2 can do, except that it transforms maps differently. If we adjust v3_codegen to accept the same format of maps in expressions and patterns, we can rename literal2/2 to literal/2 and use it for expressions and patterns.
2015-03-09v3_codegen: Don't save options in the process dictionaryBjörn Gustavsson
v3_codegen puts the compilation in the process dictionary, but never uses them.
2015-03-09v3_codegen: Teach the put_map_* instructions to reuse source registersBjörn Gustavsson
The put_map_assoc and put_map_exact instructions in the run-time system will support that the target register is the same as one of the source registers. Teach the code generator to take advantage of that. The disadvantages of not reusing register when possible is that the garbage collector may retain dead terms longer than necessary.
2014-10-01compiler: Fix harmless need_heap error for MapsBjörn-Egil Dahlberg
Need heap for maps is zero and fall through is also zero.
2014-08-26compiler: Use variables in Map beam assmeblerBjörn-Egil Dahlberg
2014-07-03compiler: Maps are always patterns never values in matchingBjörn-Egil Dahlberg
Removed dead code.
2014-03-24compiler: Remove redudant code in v3_codegenBjörn-Egil Dahlberg
2014-03-24compiler: Remove redundant clause in v3_codegenBjörn-Egil Dahlberg
Multiple 'nil' cannot happen on instruction level. Map keys are always unique with literals.
2014-03-04Properly sort map pairs in v3_codegenAnthony Ramine
Literal nil values aren't tagged tuple but the bare atom nil. The function lists:sort/2 expects the passed function to return true if the first element is less than or equal to the second, not strictly less than. The original base clause is changed accordingly. Reported-by: Ulf Norell
2014-02-13compiler: Change map instructions for fetching valuesBjörn-Egil Dahlberg
* Combine multiple get values with one instruction * Combine multiple check keys with one instruction
2014-02-07compiler: Fix codegen multiple updates for MapsBjörn-Egil Dahlberg
This fixes an error on multiple updates optimization for map pairs. The error was introduced with moving to term order in Maps. This also fixes an error where register life time was lost for values and could result in erroneuos values being emitted in for map pairs. Simplified v3_codegen by moving multiple update optimizations to v3_kernel.
2014-01-29compiler: Fix term order compiler for mapsBjörn-Egil Dahlberg
2014-01-28compiler: Implement different instructions for => and :=Björn Gustavsson
2014-01-28Pass the map pair operators through to the v3_codegen passBjörn Gustavsson
2014-01-28compiler: Fix sorted keys in put_map instructionBjörn-Egil Dahlberg
All pairs in a Map needs to be in strict ascending key order.
2014-01-28compiler: Fix no basic blocks for mapsBjörn-Egil Dahlberg
2014-01-28compiler: Fix stack register reorderingBjörn-Egil Dahlberg
Can now handle {list [reg()]} elements in instructions.
2014-01-28Implement support for maps in the compilerBjörn Gustavsson
To make it possible to build the entire OTP system, also define dummys for the instructions in ops.tab.
2014-01-16compiler: Correct line number in exception from binary constructionBjörn Gustavsson
Reported-by: Stanislav Seletskiy
2012-11-12Merge branch 'maint'Björn Gustavsson
* maint: Fix compiler crash for binary matching and a complicated guard
2012-11-06Fix compiler crash for binary matching and a complicated guardBjörn Gustavsson
The compiler would crash when attempting to compile a function head that did binary matching and had a complex expression using 'andalso' and 'not'. Noticed-by: José Valim
2012-10-09v3_codegen: Combine adjacent bs_match_string instructionsBjörn Gustavsson
In modules with huge functions with many bs_match_string instructions, we can speed up the compilation by combining adjacent bs_match_strings instruction in v3_codegen (as opposed to in beam_block where we used to do it). For instance, on my computer the v3_codegen became more than twice as fast when compiling the re_testoutput1_split_test module in the STDLIB test suites.
2012-10-09v3_codegen: Clarify code generation for binary constructionBjörn Gustavsson
Add some comments to make the code generation for binary construction somewhat clearer. Also get rid of the cg_bo_newregs/2 function that serves no useful purpose. (It was probably intended to undo the effect of cg_live/2, but note that the value passed to cg_bo_newregs/2 has not passed through cg_live/2.)