aboutsummaryrefslogtreecommitdiffstats
path: root/erts/emulator/beam/ops.tab
AgeCommit message (Collapse)Author
2019-01-21Optimize the is_function/2 guard testBjörn Gustavsson
The is_function2 instruction is executed surprisingly frequently when running dialyzer or the compiler. It cannot hurt to optimize it a little.
2019-01-07Merge pull request #2059 from michalmuskala/mm/bif-microopsBjörn Gustavsson
Use microops for BIFs
2018-12-18Make length/1 yieldingBjörn Gustavsson
The guard BIF `length/1` would calculate the length of the list in one go without yielding, even if the list was were long. To make it even worse, the call to `length/1` would only cost a single reduction. This commit reimplements `length/1` so that it eats a number of reductions proportional to the length of the list, and yields if the available reductions run out.
2018-12-17Use microops for BIFsMichał Muskała
This allows bif1/2/3 to share the main part of the code. The price is that we always need to copy all three temporary registers when error handling in bodies, but that should be infrequent. Additionally it makes it a bit harder to read the disasembly since now the arguments to BIFs are in the reverse order.
2018-12-13Simplify GC BIFsBjörn Gustavsson
Summary: This commit simplifies the implementation of the "GC BIFs" so that they no longer need to do a garbage collection, removing duplicate code for all GC BIFs in the runtime system, as well as potentially reducing the size of the loaded BEAM code by using shorter instructions calling those BIFs. A GC BIF is a guard BIF that will do a garbage collection if it needs to build anything on the heap. For example, `abs/1` is a GC BIF because it might need to allocate space on the heap (if the result is a floating point number or the resulting integer is a bignum). Before R12, a guard BIF (such as `abs/1`) that need to allocate heap space would allocate outside of process's main heap, in a heap fragment. GC BIFs were introduced in R12B to support literals. During garbage collection it become necessary to quickly test whether a term was a literal. To make the check simple, guards BIFs were no longer allowed to create heap fragments. Instead GC BIFs were introduced. In OTP 19, the implementation of literals was changed to support storing messages in heap fragments outside of the main heap for a process. That change again made it allowed for guard BIFs to create heap fragments when they need to build terms on the heap. It would even be possible for the guard BIFs to build directly on the main heap if there is room there, because the compiler assumes that a new `test_heap/2` instruction must be emitted when building anything after calling a GC BIF. (We don't do that in this commit; see below.) This commit simplifies the implementation of the GC BIFs in the runtime system. Each GC BIF had a dual implementation: one that was used when the GC BIF was called directly and one used when it was called via `apply/3`. For example, `abs/1` was implemented in `abs_1()` and `erts_gc_abs_1()`. This commit removes the GC version of each BIF. The other version that allocates heap space using `HAlloc()` is updated to use the new `HeapFragOnlyAlloc()` macro that will allocate heap space in a heap fragment outside of the main heap. Because the BIFs will allocate outside of the main heap, the same `bif` instructions used by nonbuilding BIFs can be used for the (former) GC BIFs. Those instructions don't use the macros that save and restore the heap and stack pointers (SWAPOUT/SWAPIN). If the former GC BIFs would build on the main heap, either new instructions would be needed, or SWAPOUT/SWAPIN instructions would need to be added to the `bif` instructions. Instructions that call the former GC BIFs don't need the operand that specifies the number of live X registers. Therefore, the instructions that call the BIFs are usually one word shorter.
2018-09-28Rewrite BSM optimizations in the new SSA-based intermediate formatJohn Högberg
This commit improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations. Some highlights: * Watch contexts can be reused even after being passed to a function or being used in a try block. * Sub-binaries are no longer eagerly extracted, making it far easier to keep "happy paths" free from binary creation. * Trivial wrapper functions no longer disable context reuse.
2018-09-24Support using match contexts from Y registersJohn Högberg
The upcoming beam_ssa_bsm pass allows match contexts to be used across function calls that take said context as an argument, which means it's fairly common for them to end up in Y registers.
2018-09-03Introduce a put_tuple2 instructionBjörn Gustavsson
Sometimes when building a tuple, there is no way to avoid an extra `move` instruction. Consider this code: make_tuple(A) -> {ok,A}. The corresponding BEAM code looks like this: {test_heap,3,1}. {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. {move,{x,1},{x,0}}. return. To avoid overwriting the source register `{x,0}`, a `move` instruction is necessary. The problem doesn't exist when building a list: %% build_list(A) -> [A]. {test_heap,2,1}. {put_list,{x,0},nil,{x,0}}. return. Introduce a new `put_tuple2` instruction that builds a tuple in a single instruction, so that the `move` instruction can be eliminated: %% make_tuple(A) -> {ok,A}. {test_heap,3,1}. {put_tuple2,{x,0},{list,[{atom,ok},{x,0}]}}. return. Note that the BEAM loader already combines `put_tuple` and `put` instructions into an internal instruction similar to `put_tuple2`. Therefore the introduction of the new instruction will not speed up execution of tuple building itself, but it will be less work for the loader to load the new instruction.
2018-09-03Merge branch 'maint'Björn Gustavsson
* maint: ops.tab: Fix potentially unsafe optimization of raise/2
2018-09-03ops.tab: Fix potentially unsafe optimization of raise/2Björn Gustavsson
The operands for the raise/2 instruction are almost always in x(2) and x(1). Therefore the loader translates the raise/2 instruction to an i_raise/0 instruction which uses the values in x(2) and x(1). If the operands happens to be in other registers, the loader inserts move/2 instruction to move them to x(2) and x(1). The problem is that x(3) is used as a temporary register when generating the move/2 instructions. That is unsafe if the Value operand for raise/2 is x(3). Thus: raise x(0) x(3) will be translated to: move x(0) x(3) move x(3) x(1) move x(3) x(2) i_raise The Trace will be written to both x(2) and x(1). The current compiler will never use x(3) for the Value operand, so there is no need to patch previous releases. But a future compiler version might allocate registers differently.
2018-08-17Teach binary instructions to use Y registers as destinationBjörn Gustavsson
The new code generator will use Y registers as a destination for binary construction and matching instructions. v3_codegen would always first store terms in an X register and it would be the responsibility of the optimization passes to optimize the extra moves.
2018-07-13Fix a rare crash when matching on literal mapsJohn Högberg
When matching on a literal map, the map is placed into the general scratch register first. This is fine in isolation, but when the key to be matched was in a Y register it would also be placed in the scratch register, overwriting the map and crashing the emulator.
2018-06-18Update copyright yearHenrik Nord
2018-04-04Merge pull request #1725 from michalmuskala/fun-literalsBjörn Gustavsson
Compile external fun expressions to literals OTP-15003
2018-03-26Compile external fun expressions to literalsMichał Muskała
The expressions fun M:F/A, when all elements are literals are also treated as a literal. Since they have consistent representation and don't depend on the code currently loaded in the VM, this is safe. This can provide significant performance improvements in code using such functions extensively - a full function call to erlang:make_fun/3 is replaced by a single move instruction and no register shuffling or saving registers to stack is necessary. Additionally, compound data types that contain such external functions as elements can be treated as literals too. The commit also changes the representation of external funs to be a valid Erlang syntax and adds support for literal external funs to core Erlang.
2018-03-19Correctly handle get_map_elements with a literal mapBjörn Gustavsson
A get_map_elements instruction that has a literal map operand would never be translated to a i_get_map_element instruction. That would be a problem for the following instruction: get_map_elements Fail #{} {x,0}, {x,1} Since the key is not a literal, get_map_element must be used, since get_map_elements requires that a hash value can be calculated for each element. When the instruction is translated to i_get_map_element, the hash value will be set to 0 and an assertion will trigger in the debug build.
2018-01-26Eliminate get_list/3 internally in the compilerBjörn Gustavsson
Instructions that produce more than one result complicate optimizations. get_list/3 is one of two instructions that produce multiple results (get_map_elements/3 is the other). Introduce the get_hd/2 and get_tl/2 instructions that return the head and tail of a cons cell, respectively, and use it internally in all optimization passes. For efficiency, we still want to use get_list/3 if both head and tail are used, so we will translate matching pairs of get_hd and get_tl back to get_list instructions.
2018-01-22Don't build a stacktrace if it's only passed to erlang:raise/3Björn Gustavsson
Consider the following function: function({function,Name,Arity,CLabel,Is0}, Lc0) -> try %% Optimize the code for the function. catch Class:Error:Stack -> io:format("Function: ~w/~w\n", [Name,Arity]), erlang:raise(Class, Error, Stack) end. The stacktrace is retrieved, but it is only used in the call to erlang:raise/3. There is no need to build a stacktrace in this function. We can avoid the building if we introduce an instruction called raw_raise/3 that works exactly like the erlang:raise/3 BIF except that its third argument must be a raw stacktrace.
2017-12-08Merge pull request #1634 from bjorng/bjorn/get_stacktrace-syntax/OTP-14692Björn Gustavsson
Add syntax in try/catch to retrieve the stacktrace directly
2017-11-30Add syntax in try/catch to retrieve the stacktrace directlyBjörn Gustavsson
This commit adds a new syntax for retrieving the stacktrace without calling erlang:get_stacktrace/0. That allow us to deprecate erlang:get_stacktrace/0 and ultimately remove it. The problem with erlang:get_stacktrace/0 is that it can keep huge terms in a process for an indefinite time after an exception. The stacktrace can be huge after a 'function_clause' exception or a failed call to a BIF or operator, because the arguments for the call will be included in the stacktrace. For example: 1> catch abs(lists:seq(1, 1000)). {'EXIT',{badarg,[{erlang,abs, [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]], []}, {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]}, {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]}, {shell,exprs,7,[{file,"shell.erl"},{line,687}]}, {shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]}, {shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}} 2> erlang:get_stacktrace(). [{erlang,abs, [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 23,24|...]], []}, {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]}, {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]}, {shell,exprs,7,[{file,"shell.erl"},{line,687}]}, {shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]}, {shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}] 3> We can extend the syntax for clauses in try/catch to optionally bind the stacktrace to a variable. Here is an example using the current syntax: try Expr catch C:E -> Stk = erlang:get_stacktrace(), . . . In the new syntax, it would look like: try Expr catch C:E:Stk -> . . . Only a variable (not a pattern) is allowed in the stacktrace position, to discourage matching of the stacktrace. (Matching would also be expensive, because the raw format of the stacktrace would have to be converted to the cooked form before matching.) Note that: try Expr catch E -> . . . is a shorthand for: try Expr catch throw:E -> . . . If the stacktrace is to be retrieved for a throw, the 'throw:' prefix must be explicitly included: try Expr catch throw:E:Stk -> . . .
2017-11-30Stop trying to maximize the use of x(0)Björn Gustavsson
X register 0 used to be mapped to a hardware register, and therefore faster than the other registers. Because of that, the compiler tried to use x(0) as much as possible as a temporary register. That was changed a few releases ago. X register 0 is now placed in the array of all X registers and has no special speed advantage compared to the other registers. Remove the code in the compiler that attempts to use x(0) as much as possible. As a result, the following type of instruction will be much less frequent: {put_list,Src,{x,0},{x,0}} Instead, the following type of instruction will be more frequent: {put_list,Src,{x,X},{x,X}} (Where X is an arbitrary X register.) Update the runtime system to specialize that kind of put_list instruction.
2017-11-14Fix broken receive optimizationBjörn Gustavsson
When a ref is created before performing a receive that will only receive message containing that ref, there is a compiler optimization to avoid scanning messages that can't possible contain the newly created ref. Magnus Lång pointed out that the implementation of the optimization is flawed. Exceptions or recursive calls could cause the receive operation to scan the receive queue from a position beyond the expected message (that is, the message containing the ref would never be matched out). See the receive_opt_exception/1 and receive_opt_recursion/1 test cases in receive_SUITE. It turns out that we can simplify the implementation of the optimization while fixing the bug (suggested by Magnus Lång). We actually don't need the c_p->msg.mark field. It is enough to have c_p->msg.saved_pos; if it is non-zero, it is a valid position in the message qeueue. All we need to do is to ensure that we clear c_p->msg.saved_pos when a receive is exited normally or abnormally. We can clear c_p->msg.saved_pos in JOIN_MESSAGE(), since it is called both when leaving a receive because a message matched and because there was a timeout and the 'after' clause was executed. In addition, we need to clear c_p->msg.saved_pos when an exception is caught. https://bugs.erlang.org/browse/ERL-511
2017-11-07Strengthen tests of definition of specific instructionsBjörn Gustavsson
Don't allow defining an specific operation more than once with the exact same operands. Don't allow a specific operation to be defined with different arities.
2017-10-09Slightly speed up try/catchBjörn Gustavsson
The try_end and try_case instructions are implemented the same way (try_case is translated to try_end by the loader). We can do better than that. We know that try_case will only be executed when an exception has been caught. Therefore, we know that x(0) is the non-value and that x(1) through x(3) need to be shifted down to x(0) through x(2). There is no need to test x(0) before shifting down. try_end does not need the register shifting code at all.
2017-10-05Introduce a syntax for marking operands as "optional use"Björn Gustavsson
Introduce a syntax to mark an operand that is not always used when an instrution is executed. Example of such operands are the fail label for is_nil or the number of live registers for an allocate instruction. Use a question mark to annotate optional use: is_nil f? xy allocate t t?
2017-10-05ops.tab: Slightly optimize badmatch on a Y registerBjörn Gustavsson
2017-10-01Change operand from 'P' to 'Q' for i_apply_last and i_apply_fun_lastBjörn Gustavsson
All other instructions that increment the stack pointer takes a 'Q' operand.
2017-09-14Merge branch 'bjorn/erts/relative-jumps'Björn Gustavsson
* bjorn/erts/relative-jumps: Pack failure labels in i_select_val2 and i_select_tuple_arity2 Optimize i_select_tuple_arity2 and is_select_lins Rewrite select_val_bins so that its labels can be packed Pack sequences of trailing 'f' operands Implement packing of 'f' and 'j' Make sure that mask literals are 64 bits Use relative failure labels Add information about offset to common group start position Remove JUMP_OFFSET Refactor instructions to support relative jumps Introduce a new trace_jump/1 instruction for tracing Avoid using $Src more than once
2017-09-14Pack failure labels in i_select_val2 and i_select_tuple_arity2Björn Gustavsson
2017-09-13Merge branch 'bjorn/erts/improve-beam-ops'Björn Gustavsson
* bjorn/erts/improve-beam-ops: Add built-in macros $ARG_POSITION() and $IS_PACKED() Use 't' instead of 'I' bit syntax operands Optimize operand type for match context in i_bs_get_integer Change operand type from 's' to 'S' for a few instructions Use the correct name of the parameter
2017-09-13Merge pull request #1544 from michalmuskala/eq-optBjörn Gustavsson
Optimise equality comparisons
2017-09-11Use 't' instead of 'I' bit syntax operandsBjörn Gustavsson
The number of live registers, unit and flags, and the number of slots all fit comortably in 16 bits. This change will give more opportunities for packing.
2017-09-11Optimize operand type for match context in i_bs_get_integerBjörn Gustavsson
The match context is always in an X register. Change the 's' operand to 'x'. While we are it, also change the operands for Live and FlagsAndUnit to 't' (they will both fit comfortably in 16 bits).
2017-09-11Change operand type from 's' to 'S' for a few instructionsBjörn Gustavsson
'S' is slightly more efficient. Using 'S' may also enable more packing.
2017-09-11Introduce a new trace_jump/1 instruction for tracingBjörn Gustavsson
As a preparation for introducing relative jumps, introduce "trace_jump W" that can be used for tracing. This instruction will continue to have an absolute address for the jump target. (Note: This instruction is never created during loading; it is only created in stubs when tracing is active.)
2017-09-08Optimise equality comparisonsMichał Muskała
* In both loader and compiler, make sure constants are always the second operand - many passes of the compiler assume that's always the case. * In loader rewrite is_eq_exact with same arguments to skip the instruction and with different constants move one to an x register to maintain the properly outlined above. * The same (but in reverse) is done with the is_ne_exact, where we rewrite to an unconditional jump or add a move to an x register. * All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
2017-09-04Annotate try_case_end as coldBjörn Gustavsson
try_case_end is an excepting-generating instruction that is infrequently executed.
2017-08-31ops.tab: Mark infrequently used instructions as %coldBjörn Gustavsson
Instructions that used to be implemented in beam_emu.c were not marked as cold as it would make no difference.
2017-08-31Introduce '%warm' and beam_warm.hBjörn Gustavsson
The bit syntax instructions are mixed among other instructions in beam_hot.h and beam_cold.h. Introduce a new hotness level called '%warm' with is associated file beam_warm.h. Mark all bit syntax instructions as '%warm'.
2017-08-23Add the 'S' type for a register sourceBjörn Gustavsson
The type 'd' could be used both for destination registers and source register. Restrict the 'd' type to only be used for destinations, and introduce the new 'S' type to be used when a source must be a register.
2017-08-23Introduce more packable typesBjörn Gustavsson
The 'I' type can be replaced with 't' if we know that the value will fit in 16 bits. The 'P' type can be replaced with 'Q' if it is used for deallocating a stack frame.
2017-08-23Make map update instruction functions indepedent of instruction formatBjörn Gustavsson
Having the helper functions for map update knowing all the details of operands for the instruction will make it difficult to make improvements such as better packing.
2017-08-23beam_makeops: Introduce the new type 'W' (machine word)Björn Gustavsson
As a preparation for potentially improving packing in the future, we will need to make sure that packable types have a defined maximum size. The packer algorithm assumes that two 'I' operands can be packed into one 64-bit word, but there are instructions that use an 'I' operand to store a pointer. It only works because those instructions are not packed for other reasons. Introduce the 'W' type and use it for operands that don't fit in 32 bits.
2017-08-23Use the wait_timeout_{un}locked_int instructionsBjörn Gustavsson
The transformations were incorrect.
2017-08-15Slightly optimize updating of mapsBjörn Gustavsson
The instruction put_map_assoc/5 (used for updating a map) has a failure operand, but it can't actually fail provided that its "map" argument is a map. The following code: M#{key=>value}. will be compiled to: {test,is_map,{f,3},[{x,0}]}. {line,[...]}. {put_map_assoc,{f,0},{x,0},{x,0},1,{list,[{atom,key},{atom,value}]}}. return. {label,3}. %% Code that produces a 'badmap' exception follows. Because of the is_map instruction, {x,0} always contains a map when the put_map_assoc instruction is executed. Therefore we can remove the failure operand. That will save one word, and also eliminate two tests at run-time. The only problem is that the compiler in OTP 17 did not emit a is_map instruction before the put_map_assoc instruction. Therefore, we must add an instruction that tests for a map if the code was compiled with the OTP 17 compiler. Unfortunately, there is no safe and relatively easy way to known that the OTP 17 compiler was used, so we will check whether a compiler before OTP 20 was used. OTP 20 introduced a new chunk type for atoms, which is trivial to check.
2017-08-11Break out most instructions from beam_emu.cBjörn Gustavsson
2017-08-08Simplify specifying implementation of instructionsBjörn Gustavsson
Eliminate the need to write pre-processor macros for each instruction. Instead allow the implementation of instruction to be written in C directly in the .tab files. Rewrite all existing macros in this way and remove the %macro directive.
2017-07-06Introduce new_small_map_lit opMichal Muskala
Take advantage of the fact that small maps have a tuple for keys. When new map is constructed and all keys are literals, we can construct the entire keys tuple as a literal. This should reduce the memory of maps created with literal keys almost by half, since they all can share the same keys tuple.
2017-06-28Reduce C code size for bit syntax instructionsBjörn Gustavsson
Bit syntax instructions never store their result in a Y register. Therefore, change the bit syntax instructions to use 'x' as the destination instead of 'd'. That will simplify the code that stores the result, and will be a slight reduction in code size and execution time.
2017-06-28ops.tab: Use combined operandsBjörn Gustavsson