aboutsummaryrefslogtreecommitdiffstats
path: root/lib/compiler/src/v3_codegen.erl
AgeCommit message (Collapse)Author
2018-08-13Correct error behavior of is_map_key/2 in guardsBjörn Gustavsson
Consider the following functions: foo() -> bar(not_a_map). bar(M) when not is_map_key(a, M) -> ok; bar(_) -> error. What will `foo/0` return? It depends. If the module is compiled with the default compiler options, the return value will be `ok`. If the module is compiled with the `inline` option, the return value will be `error`. The correct value is `error`, because the call to `is_map_key/2` when the second argument is not a map should fail the entire guard. That is the way other failing guards BIFs are handled. For example: foo() -> bar(not_a_tuple). bar(T) when not element(1, T) -> ok; bar(_) -> error. `foo/0` always returns `error` (whether the code is inlined or not). This bug can be fixed by changing the classification of `is_map_key/2` in the `erl_internal` module. It is now classified as a type test, which is incorrect because type tests should not fail. Reclassifying it as a plain guard BIF corrects the bug. This correction also fixes the internal consistency check failure which was reported in: https://bugs.erlang.org/browse/ERL-699
2018-06-18Update copyright yearHenrik Nord
2018-04-29Introduce is_map_key/2 guard BIFMichał Muskała
This complements the `map_get/2` guard BIF introduced in #1784. Rationale. `map_get/2` allows accessing map fields in guards, but it might be problematic in more complex guard expressions, for example: foo(X) when map_get(a, X) =:= 1 or is_list(X) -> ... The `is_list/1` part of the guard could never succeed since the `map_get/2` guard would fail the whole guard expression. In this situation, this could be solved by using `;` instead of `or` to separate the guards, but it is not possible in every case. To solve this situation, this PR proposes a `is_map_key/2` guard that allows to check if a map has key inside a guard before trying to access that key. When combined with `is_map/1` this allows to construct a purely boolean guard expression testing a value of a key in a map. Implementation. Given the use case motivating the introduction of this function, the PR contains compiler optimisations that produce optimial code for the following guard expression: foo(X) when is_map(X) and is_map_key(a, X) and map_get(a, X) =:= 1 -> ok; foo(_) -> error. Given all three tests share the failure label, the `is_map_key/2` and `is_map/2` tests are optimised away. As with `map_get/2` the `is_map_key/2` BIF is allowed in match specs.
2018-04-24Introduce map_get guard-safe functionMichał Muskała
Rationale Today all compound data types except for maps can be deconstructed in guards. For tuples we have `element/2` and for lists `hd/1` and `tl/1`. Maps are completely opaque to guards. This means matching on maps can't be abstracted into macros, which is often done with repetitive guards. It also means that maps have to be always selected whole from ETS tables, even when only one field would be enough, which creates a potential efficiency issue. This PR introduces an `erlang:map_get/2` guard-safe function that allows extracting a map field in guard. An alternative to this function would be to introduce the syntax for extracting a value from a map that was planned in the original EEP: `Map#{Key}`. Even outside of guards, since this function is a guard-BIF it is more efficient than using `maps:get/2` (since it does not need to set up the stack), and more convenient from pattern matching on the map (compare: `#{key := Value} = Map, Value` to `map_get(key, Map)`). Performance considerations A common concern against adding this function is the notion that "guards have to be fast" and ideally execute in constant time. While there are some counterexamples (`length/1`), what is more important is the fact that adding those functions does not change in any way the time complexity of pattern matching - it's already possible to match on map fields today directly in patterns - adding this ability to guards will niether slow down or speed up the execution, it will only make certain programs more convenient to write. This first version is very naive and does not perform any optimizations.
2018-03-23Allow the match context identifier to be any termBjörn Gustavsson
During compilation, the bs_save2 and bs_restore2 instructions contain a match context reference. That reference is the variable name that holds the match context. beam_clean assumes that the reference always is an atom, which is not a safe assumption since integers are legal variable names in Core Erlang.
2018-01-26Eliminate get_list/3 internally in the compilerBjörn Gustavsson
Instructions that produce more than one result complicate optimizations. get_list/3 is one of two instructions that produce multiple results (get_map_elements/3 is the other). Introduce the get_hd/2 and get_tl/2 instructions that return the head and tail of a cons cell, respectively, and use it internally in all optimization passes. For efficiency, we still want to use get_list/3 if both head and tail are used, so we will translate matching pairs of get_hd and get_tl back to get_list instructions.
2018-01-22Don't build a stacktrace if it's only passed to erlang:raise/3Björn Gustavsson
Consider the following function: function({function,Name,Arity,CLabel,Is0}, Lc0) -> try %% Optimize the code for the function. catch Class:Error:Stack -> io:format("Function: ~w/~w\n", [Name,Arity]), erlang:raise(Class, Error, Stack) end. The stacktrace is retrieved, but it is only used in the call to erlang:raise/3. There is no need to build a stacktrace in this function. We can avoid the building if we introduce an instruction called raw_raise/3 that works exactly like the erlang:raise/3 BIF except that its third argument must be a raw stacktrace.
2017-12-18v3_codegen: Don't let exit BIFs force a stack frameBjörn Gustavsson
This is an enhancement of the optimization added in 2e5d6201bb044, where we tried to avoid forcing a stack frame for functions that don't really need them. That optimization would not suppress the stack frame for this function: f(A) -> Res = case A of a -> x; b -> y end, {ok,Res}. The reason is that internally the compiler would rewrite the code to something like this: f(A) -> Res = case A of a -> x; b -> y; Other -> error({case_clause,Other}) end, {ok,Res}. The call to error/1 would force creation of a stack frame, even though it is not really needed because error/1 causes an exception. Handle calls to exit BIFs specially to allow suppressing the stack frame.
2017-12-18Merge pull request #1658 from bjorng/bjorn/compiler/delay-stackframeBjörn Gustavsson
Delay creation of stack frames
2017-12-15v3_codegen: Delay creation of stack framesBjörn Gustavsson
v3_codegen currently wraps a stack frame around each clause in a function (unless the clause is simple without any 'case' or other complex constructions). Consider this function: f({a,X}) -> A = abs(X), case A of 0 -> {result,"0"}; _ -> {result,integer_to_list(A)} end; f(_) -> error. The first clause needs a stack frame because there is a function call to integer_to_list/1 not in the tail position. v3_codegen currently wraps the entire first clause in stack frame. We can delay the creation of the stack frame, and create a stack frame in each arm of the 'case' (if needed): f({a,X}) -> A = abs(X), case A of 0 -> %% Don't create a stack frame here. {result,"0"}; _ -> %% Create a stack frame here. {result,integer_to_list(A)} end; f(_) -> error. There are pros and cons of this approach. The cons are that the code size may increase if there are many 'case' clauses and each needs its own stack frame. The allocation instructions may also interfere with other optimizations, but the new optimizations introduced in previous commits will mitigate most of those issues. The pros are the following: * For some clauses in a 'case', there is no need to create any stack frame at all. * Often when moving an allocation instruction into a 'case' clause, the slightly cheaper 'allocate' instruction can be used instead of 'allocate_zero'. There is also the possibility that the allocate instruction can be be combined with a 'test_heap' instruction. * Each stack frame for each arm of the 'case' will have exactly as many slots as needed.
2017-12-09v3_codegen: Eliminate unused function argumentsBjörn Gustavsson
758712d6294 changed the need_heap/2 function so that it stopped using its second argument. Remove the second argument from need_heap(), and update all callers to similarly remove unused arguments.
2017-12-08Use the new syntax for retrieving stack tracesBjörn Gustavsson
2017-12-08Merge pull request #1634 from bjorng/bjorn/get_stacktrace-syntax/OTP-14692Björn Gustavsson
Add syntax in try/catch to retrieve the stacktrace directly
2017-12-07v3_codegen: Simplify #k_guard_break{}Björn Gustavsson
It turns out that we don't need to keep track of locked variables, because the locked variables are always the same variables that will be alive after a #k_guard_break{}.
2017-12-07v3_codegen: Remove uncovered clause in bs_rename_ctx/4Björn Gustavsson
Remove handling of #k_match{} in bsm_rename_ctx/4. It can never be reached because bsm_rename_ctx/4 will never recurse into a block that is not in the scope of a #k_protected{}, and in a #k_protected{}, #k_match{} is not allowed.
2017-12-07v3_codegen: Move guard_cg_list/6 to a more logical placeBjörn Gustavsson
Put guard_cg_list/6 directly after guard_cg/5.
2017-12-07v3_codegen: Remove unnecessary clause for handling #cg_block{}Björn Gustavsson
The function guard_cg/5 handles constructs found within the records #k_guard_clause{] and #k_protected{}. Since #k_guard_clause{} can only contain a #k_protected{}, and #k_protected{} in turn cannot contain a #cg_block{}, the clause for handling #cg_block{} in guard_cg/5 is never executed and can be removed.
2017-12-07v3_codegen: Remove unnecessary adding of variable to setBjörn Gustavsson
The variable being added will already be there (added by v3_kernel).
2017-12-06v3_codegen: Avoid excessive stack frame allocationBjörn Gustavsson
A 'case' or 'if' that does not occur last in a function clause will always force a stack frame. The reasoning behind this is that in most uses of 'case' there will be a function call from within the 'case'. When there is a function call, the stack frame is needed both to save the continuation pointer and to save any X registers that will need to survive the call. When there is no function call from a 'case', the resulting stack frame is annoying. There will be register shuffling, and the existence of the stack frame may thwart many optimizations (for example, in beam_dead). Therefore, add an extra pass to v3_codegen to avoid creating a stack frame when not needed. https://bugs.erlang.org/browse/ERL-514
2017-12-04Clean up and comment code generation for basic blocksBjörn Gustavsson
2017-11-30Add syntax in try/catch to retrieve the stacktrace directlyBjörn Gustavsson
This commit adds a new syntax for retrieving the stacktrace without calling erlang:get_stacktrace/0. That allow us to deprecate erlang:get_stacktrace/0 and ultimately remove it. The problem with erlang:get_stacktrace/0 is that it can keep huge terms in a process for an indefinite time after an exception. The stacktrace can be huge after a 'function_clause' exception or a failed call to a BIF or operator, because the arguments for the call will be included in the stacktrace. For example: 1> catch abs(lists:seq(1, 1000)). {'EXIT',{badarg,[{erlang,abs, [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]], []}, {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]}, {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]}, {shell,exprs,7,[{file,"shell.erl"},{line,687}]}, {shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]}, {shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}} 2> erlang:get_stacktrace(). [{erlang,abs, [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 23,24|...]], []}, {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]}, {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]}, {shell,exprs,7,[{file,"shell.erl"},{line,687}]}, {shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]}, {shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}] 3> We can extend the syntax for clauses in try/catch to optionally bind the stacktrace to a variable. Here is an example using the current syntax: try Expr catch C:E -> Stk = erlang:get_stacktrace(), . . . In the new syntax, it would look like: try Expr catch C:E:Stk -> . . . Only a variable (not a pattern) is allowed in the stacktrace position, to discourage matching of the stacktrace. (Matching would also be expensive, because the raw format of the stacktrace would have to be converted to the cooked form before matching.) Note that: try Expr catch E -> . . . is a shorthand for: try Expr catch throw:E -> . . . If the stacktrace is to be retrieved for a throw, the 'throw:' prefix must be explicitly included: try Expr catch throw:E:Stk -> . . .
2017-11-30Stop trying to maximize the use of x(0)Björn Gustavsson
X register 0 used to be mapped to a hardware register, and therefore faster than the other registers. Because of that, the compiler tried to use x(0) as much as possible as a temporary register. That was changed a few releases ago. X register 0 is now placed in the array of all X registers and has no special speed advantage compared to the other registers. Remove the code in the compiler that attempts to use x(0) as much as possible. As a result, the following type of instruction will be much less frequent: {put_list,Src,{x,0},{x,0}} Instead, the following type of instruction will be more frequent: {put_list,Src,{x,X},{x,X}} (Where X is an arbitrary X register.) Update the runtime system to specialize that kind of put_list instruction.
2017-11-30Clean up collection of basic blocksBjörn Gustavsson
2017-11-29v3_codegen: Remove check of operand for bs_context_to_binaryBjörn Gustavsson
The bs_context_to_binary instruction only allows a register operand. v3_codegen has a test to ensure that the operand is a register. That test is no longer necessary. There used to be a possibility that optimizations in sys_core_fold and the inliner could change the operand for bs_context_to_binary to a binary literal. Since 09112806c15a81b that can no longer happen, because no more optimizations are run after the introduction of the bs_context_to_binary instruction.
2017-10-27Eliminate the v3_life passBjörn Gustavsson
The v3_life pass does not do enough to be worth being its own pass. Essentially it does two things: * Calculates life-time information starting from the annotations that v3_kernel provides. That part can be moved into v3_codegen. * Rewrites the Kernel Erlang records to similar plain tuples (for example, #k_cons{hd=Hd,tl=Tl} is rewritten to {cons,Hd,Tl}). That rewriting is not needed and can be eliminated.
2017-10-21Optimize matching of literals for single-valued typesBjörn Gustavsson
If a type only has one clause and if the pattern is literal, the matching can be done more efficiently by directly comparing with the literal. Example: find(String, "") -> String; find(String, <<>>) -> String; find(String, SearchPattern) -> . . . Without this optimization, the relevant part of the code would look this: {test,bs_start_match2,{f,3},2,[{x,1},0],{x,2}}. {test,bs_test_tail2,{f,4},[{x,2},0]}. return. {label,3}. {test,is_nil,{f,4},[{x,1}]}. return. {label,4}. . . . That is, if {x,1} is a binary, a match context will be built to test whether {x,1} is an empty binary. With the optimization, the code will look this: {test,is_eq_exact,{f,3},[{x,1},{literal,<<>>}]}. return. {label,3}. {test,is_nil,{f,4},[{x,1}]}. return. {label,4}. . . .
2017-10-02Merge pull request #1546 from michalmuskala/bs-add-optBjörn Gustavsson
Optimise size calculation for binary construction OTP-14654
2017-09-08Optimise size calculation for binary constructionMichał Muskała
It turns out it was extremely common to have a following sequence: move 0 D1 byte_size _ D2 bs_add D1 D2 D Which is equivalent to just: byte_size _ D Similarly another sequence: move S D1 byte_size _ D2 bs_add D1 D2 D Can be optimised into: byte_size _ D2 bs_add S D2 D Both of those optimisations work with byte_size and bit_size instructions.
2017-08-31Eliminate unnecessary 'move' instructionsBjörn Gustavsson
The compiler could sometimes emit unnecessary 'move' instructions in the code for binary matching, for example for this function: escape(<<Byte, Rest/bits>>, Pos) when Byte >= 127 -> escape(Rest, Pos + 1); escape(<<Byte, Rest/bits>>, Pos) -> escape(Rest, Pos + Byte); escape(<<_Rest/bits>>, Pos) -> Pos. The generated code would look like this: {function, escape, 2, 2}. {label,1}. {line,[{location,"t.erl",17}]}. {func_info,{atom,t},{atom,escape},2}. {label,2}. {test,bs_start_match2,{f,1},2,[{x,0},0],{x,0}}. {test,bs_get_integer2, {f,4}, 2, [{x,0}, {integer,8}, 1, {field_flags,[{anno,[17,{file,"t.erl"}]},unsigned,big]}], {x,2}}. {'%',{bin_opt,[17,{file,"t.erl"}]}}. {move,{x,0},{x,3}}. %% UNECESSARY! {test,is_ge,{f,3},[{x,2},{integer,127}]}. {line,[{location,"t.erl",18}]}. {gc_bif,'+',{f,0},4,[{x,1},{integer,1}],{x,1}}. {move,{x,3},{x,0}}. %% UNECESSARY! {call_only,2,{f,2}}. {label,3}. {line,[{location,"t.erl",20}]}. {gc_bif,'+',{f,0},4,[{x,1},{x,2}],{x,1}}. {move,{x,3},{x,0}}. %% UNECESSARY! {call_only,2,{f,2}}. {label,4}. {move,{x,1},{x,0}}. return. The redundant 'move' instructions have been marked. To avoid the 'move' instructions, we can extend the existing function is_context_unused/1 in v3_codegen. If v3_codegen can know that the match context will not be used again, it can reuse the register for the match context and avoid the extra 'move' instructions. https://bugs.erlang.org/browse/ERL-444
2017-01-12v3_codegen: Add missing types and specsBjörn Gustavsson
2016-11-18v3_kernel: Generate optimized code for guardsBjörn Gustavsson
The compiler produces poor code for complex guard expressions with andalso/orelse. Here is an example from the filename module: -define(IS_DRIVELETTER(Letter),(((Letter >= $A) andalso (Letter =< $Z)) orelse ((Letter >= $a) andalso (Letter =< $z)))). skip_prefix(Name, false) -> Name; skip_prefix([L, DrvSep|Name], DrvSep) when ?IS_DRIVELETTER(L) -> Name; skip_prefix(Name, _) -> Name. beam_bool fails to simplify the code for the guard, leaving several 'bif' instructions: {function, skip_prefix, 2, 49}. {label,48}. {line,[{location,"filename.erl",187}]}. {func_info,{atom,filename},{atom,skip_prefix},2}. {label,49}. {test,is_ne_exact,{f,52},[{x,1},{atom,false}]}. {test,is_nonempty_list,{f,52},[{x,0}]}. {get_list,{x,0},{x,2},{x,3}}. {test,is_nonempty_list,{f,52},[{x,3}]}. {get_list,{x,3},{x,4},{x,5}}. {bif,'=:=',{f,52},[{x,1},{x,4}],{x,6}}. {test,is_ge,{f,50},[{x,2},{integer,65}]}. {bif,'=<',{f,52},[{x,2},{integer,90}],{x,7}}. {test,is_eq_exact,{f,51},[{x,7},{atom,false}]}. {test,is_ge,{f,50},[{x,2},{integer,97}]}. {bif,'=<',{f,52},[{x,2},{integer,122}],{x,7}}. {jump,{f,51}}. {label,50}. {move,{atom,false},{x,7}}. {label,51}. {bif,'=:=',{f,52},[{x,7},{atom,true}],{x,7}}. {test,is_eq_exact,{f,52},[{x,6},{atom,true}]}. {test,is_eq_exact,{f,52},[{x,7},{atom,true}]}. {move,{x,5},{x,0}}. return. {label,52}. return. We can add optimizations of guard tests to v3_kernel to achive a better result: {function, skip_prefix, 2, 49}. {label,48}. {line,[{location,"filename.erl",187}]}. {func_info,{atom,filename},{atom,skip_prefix},2}. {label,49}. {test,is_ne_exact,{f,51},[{x,1},{atom,false}]}. {test,is_nonempty_list,{f,51},[{x,0}]}. {get_list,{x,0},{x,2},{x,3}}. {test,is_nonempty_list,{f,51},[{x,3}]}. {get_list,{x,3},{x,4},{x,5}}. {test,is_eq_exact,{f,51},[{x,1},{x,4}]}. {test,is_ge,{f,51},[{x,2},{integer,65}]}. {test,is_lt,{f,50},[{integer,90},{x,2}]}. {test,is_ge,{f,51},[{x,2},{integer,97}]}. {test,is_ge,{f,51},[{integer,122},{x,2}]}. {label,50}. {move,{x,5},{x,0}}. return. {label,51}. return. Looking at the STDLIB application, there were 112 lines of BIF calls in guards that beam_bool failed to convert to test instructions. This commit eliminates all those BIF calls. Here is how I counted the instructions: $ PATH=$ERL_TOP/bin:$PATH erlc -I ../include -I ../../kernel/include -S *.erl $ grep "bif,'[=<>]" *.S | grep -v f,0 dets.S: {bif,'=:=',{f,547},[{x,4},{atom,read_write}],{x,4}}. dets.S: {bif,'=:=',{f,547},[{x,5},{atom,saved}],{x,5}}. dets.S: {bif,'=:=',{f,589},[{x,5},{atom,read}],{x,5}}. . . . $ grep "bif,'[=<>]" *.S | grep -v f,0 | wc 112 224 6765 $
2016-09-21Simplify handling of internal BIFsBjörn Gustavsson
Do a simpler translation of internal BIFs. While we are it, also remove the dummy values of Index and Uniq from the make_fun internal operation.
2016-06-02Eliminate crash for map updates in guardsBjörn Gustavsson
beam_validator would complain that x(1) is uninitialized in a test_heap instruction when attempting to compile the following code with sys_core_fold turned off: foo(M) when not (M#{true := 0}); [M] -> ok. Simplified, the generated BEAM assembly code looked like this: test is_map BadMap x(0) put_map_exact Fail x(0) => x(1) ... jump BooleanStuff BadMap: move ok => x(1) jump Fail BooleanStuff: ... move Boolean => x(2) jump Build Fail: move false => x(2) Build: test_heap 2 3 %% x(0), x(1), x(2) must be live. ... That is, if put_map_exact failed, control would transfer to the label Fail without initializing x(1). Fix that by making sure that x(1) is initilized even if put_map_exact fails: test is_map BadMap x(0) put_map_exact BadLbl x(0) => x(1) ... jump OkLbl BadLbl: move ok => x(1) jump Fail OkLbl: jump BooleanStuff BadMap: move ok => x(1) jump Fail BooleanStuff: ... move Boolean => x(2) jump Build Fail: move false => x(2) Build: test_heap 2 3 %% x(0), x(1), x(2) must be live. ... Note that this situation is rare, and that other optimization passes (beam_dead and beam_jump in particular) will clean up this mess.
2016-05-25v3_codegen: Don't confuse beam_validatorBjörn Gustavsson
Generate code that not only is safe, but can easily be seen by beam_validator to be safe.
2016-05-25v3_codegen: Correct code generation for an error/1 call in a guardBjörn Gustavsson
Sometimes v3_codegen would generate unsafe code when there was a call to error/1 in a guard.
2016-03-15update copyright-yearHenrik Nord
2016-01-11Merge branch 'maint'Björn Gustavsson
* maint: Eliminate crash in v3_codegen
2016-01-11Eliminate crash in v3_codegenBjörn Gustavsson
The following code would crash v3_codegen: order(From) -> catch if From#{[] => sufficient} -> saint end. Before explaining the crash, first some background on the stack frame and the Y registers. Certain instructions, most notably the 'call' instructions, clobber all X registers. Before any such instruction, all X registers that have values that will be used after the call must be saved to Y registers (i.e. to the stack frame). adjust_stack/4 will be called when X registers must be saved. There is also another situation when X registers must be saved, namely within a 'catch' if we are about to execute any instruction that may cause an exception. Examples of such instructions are some guard BIFs (such as length/1) and construction of binaries or maps. Within a 'catch', X registers must be be saved because if an exception is thrown and catched all X registers will be destroyed. The same adjust_stack/4 function will be called for those instructions, but only if they occur within a 'catch'. There is actually one more complication. If there is code in a guard within a catch, the X registers should not be saved, because the code in a guard never clobbers any X registers that were alive before the guard code was entered. v3_codegen is written with the implicit assumption that code in guards never cause anything to be saved to Y registers. The code for building maps and binaries would incorrectly save X registers within a guard inside a 'catch'. For construction of binaries, that would mean that a useless but harmelss 'move' instruction was generated. But for construction of maps, the saving of the Y register would not be harmless. There would be a crash when attempting to merge #sr{} records. #sr{} records keeps track of the contents of X and Y registers. When two separate code paths are joined (e.g. at the end of 'case' statement), the register descriptors must be reconciled. Basically, the register descriptors for both paths must be identical. The #sr{} record for one path must not claim that {y,0} contains a certain value, while another path claims that {y,0} is dead. Thus, the crash occurs in sr_merge/2 when failing to reconcile the Y registers. To fix this bug this bug we will introduce a new function called maybe_adjust_stack/5. It will save X registers on the stack only if the code is inside a catch but not inside a guard. We will change all existing code to use this new function when appropriate. Reported-by: Thomas Arts
2015-09-28v3_codegen: Optimize matching of the final size-less binary segmentBjörn Gustavsson
Consider the following function: f(Bin, Bool) -> case Bin of <<Val:16/binary,_/binary>> when Bool -> Val end. Simplified, the generated code looks like: bs_start_match2 Fail Live Bin => Bin bs_get_integer2 Fail Live Bin size=Sz unit=1 => Val bs_skip_bits2 Fail Bin size=all unit=8 is_eq_exact Fail Bool true The code generator will replace the bs_skip_bits2 instruction with a bs_test_unit instruction if it can be clearly seen that the context register will not be used again. In this case, it is not obvious without looking at the code at the Fail label. However, it turns out that bs_test_unit instruction is always safe beacuse of the way v3_kernel compiles pattern matching. It doesn't matter whether the match context will be used again. If it will be used again, the position in it will *not* be used. Instead, a bs_restore2 instruction will restore one of the saved instructions.
2015-06-18Change license text to APLv2Bruce Yinhe
2015-05-21v3_codegen: Use Maps to map local functionsBjörn-Egil Dahlberg
2015-05-21compiler: Use lc instead of map/1 in v3_codegenBjörn-Egil Dahlberg
Small speed increase for large files.
2015-04-29v3_core, v3_codegen: Eliminate old-style catchesBjörn Gustavsson
2015-04-22v3_codegen: Reduce cost for fixing up bs_match_string instructionsBjörn Gustavsson
Commit b76588fb5a introduced an optimization of the compile time of huge functions with many bs_match_string instructions. The optimization is done in two passes. The first pass coalesces adjacent bs_match_string instructions. To avoid copying bitstrings multiple times, the bitstrings in the instructions are combined in to a (deep) list. The second pass goes through all instructions in the function and combines the list of bitstrings to a single bitstring in all bs_match_string instructions. The second pass (fix_bs_match_string) is run on all instructions in each function, even if there are no bs_match_instructions in the function. While fix_bs_match_string is not a bottleneck (it is a linear pass), its execution time is noticeable when profiling some modules. Move the execution of the second pass to the select_binary() function so that it will only be executed for instructions that do binary matching. Also take the opportunity to optimize away uses of bs_restore2 that occour directly after a bs_save2. That optimimization is currently done in beam_block, but it can be done essentially for free in the same pass that fixes up bs_match_string instructions.
2015-04-22v3_codegen: Optimize "turning" of y registersBjörn Gustavsson
Profiling shows that the execution time for "turning" y registers is noticeable for some modules (e.g. S1AP-PDU-Contents from the asn1 test suite). We can reduce the impact on running time by special-casing important instructions. In particular, there is no need to look for y registers in the list argument for a select_val instruction.
2015-04-13v3_codegen: Don't sort map keys in map creation/updateBjörn Gustavsson
It is no longer necessary to sort the keys, since the loader does the sorting.
2015-03-11v3_life: Combine literal/2 and literal2/2Björn Gustavsson
For unclear reasons, there are two functions in v3_life that are almost identical: literal/2 and literal2/2. literal/2 is used for expressions and literal2/2 for patterns. It turns out that literal2/2 can do everything that literal/2 can do, except that it transforms maps differently. If we adjust v3_codegen to accept the same format of maps in expressions and patterns, we can rename literal2/2 to literal/2 and use it for expressions and patterns.
2015-03-09v3_codegen: Don't save options in the process dictionaryBjörn Gustavsson
v3_codegen puts the compilation in the process dictionary, but never uses them.
2015-03-09v3_codegen: Teach the put_map_* instructions to reuse source registersBjörn Gustavsson
The put_map_assoc and put_map_exact instructions in the run-time system will support that the target register is the same as one of the source registers. Teach the code generator to take advantage of that. The disadvantages of not reusing register when possible is that the garbage collector may retain dead terms longer than necessary.
2014-10-01compiler: Fix harmless need_heap error for MapsBjörn-Egil Dahlberg
Need heap for maps is zero and fall through is also zero.