otp.git - Mirror of Erlang/OTP repository.

Age	Commit message (Collapse)	Author
2019-06-24	Merge branch 'bjorn/erts/unoptimized-plus' into maint	Björn Gustavsson
	* bjorn/erts/unoptimized-plus: Handle addition of bignum + variable in unoptimized code
2019-06-13	erts: Relax the constraint introduced in OTP-15871	John Högberg

2019-06-12	Handle addition of bignum + variable in unoptimized code	Björn Gustavsson
	Also handles code compiled by OTP 21 and earlier.
2019-06-10	erts: Fix bad loader optimization of get_tuple_element	John Högberg
	The following sequence would be wrongly optimized into a i_get_tuple_element2 instruction, reading an element from the wrong tuple: {get_tuple_element,{x,0},1,{x,0}}. {get_tuple_element,{x,0},2,{x,1}}.
2019-04-05	Optimize the i_minus instruction	Björn Gustavsson
	Measurements show that i_minus instructions (subtraction) are frequent enough to warrant creating specialized i_minus instructions. Thanks to José Valim for doing instruction counting on Elixir code.
2019-03-28	Merge branch 'john/erts/remove-destructive-bs_get_binary2/ERL-901'	John Högberg
	* john/erts/remove-destructive-bs_get_binary2/ERL-901: erts: Remove unsafe bs_get_binary2 optimization from loader
2019-03-27	erts: Remove unsafe bs_get_binary2 optimization from loader	John Högberg
	A load-time optimization assumed that match contexts had no further uses when a bs_get_binary2 overwrote the match context's register, and figured it would be safe to reuse the match context's memory for the resulting binary. This is no longer safe as of OTP 22, as a match context may be reused after being passed to another function.
2019-03-27	Fix unsafe optimization made by the loader	Björn Gustavsson
	Fix the unsafe load-time optimization introduced in 07bdbb3a1edc. https://bugs.erlang.org/browse/ERL-899
2019-03-20	Optimize moving of several Y registers to X registers	Björn Gustavsson
	Introduce move_src_window[234] instructions for moving several consecutively numbered Y registers to discontiguously numbered X registers. This optimization is effective because the compiler has sorted the `move` instructions in Y register order.
2019-03-19	Optimize map updating instructions	Björn Gustavsson

2019-03-19	Optimize funs converted to literals	Björn Gustavsson

2019-03-19	Combine move and init to move_shift	Björn Gustavsson

2019-03-19	Remove the move_dup instruction	Björn Gustavsson
	move_dup is used very infrequently.
2019-03-19	Optimize some common uses of '+' and '-'	Björn Gustavsson

2019-03-19	Extend move_shift to accept a literal Src operand	Björn Gustavsson

2019-03-19	Tune the move_jump instruction	Björn Gustavsson
	With the new compiler, it has become less common with a move to x(0) before a jump. Change the move_jump instruction to take a destination as well as a source.
2019-03-19	Eliminate i_length_setup with a literal list operand	Björn Gustavsson

2019-03-19	Replace swap_temp with swap more aggressively	Björn Gustavsson
	Also support swap of Y registers.
2019-03-19	Add another move_shift variation	Björn Gustavsson
	It turns out that sequences such as the following are common: move x0 Y1 move Y2 x0
2019-03-19	Combine move with trim	Björn Gustavsson
	It is relatively common to move something from a Y register to an X register before trimming.
2019-03-19	Refactor put_list instructions for readability	Björn Gustavsson
	Apart from the refactoring, the instruction "put_list x c y" is replaced with "put_list x n y".
2019-03-19	Combine is_tuple with is_tagged_tuple	Björn Gustavsson

2019-03-09	Optimize tail-recursive calls of BIFs	Björn Gustavsson
	BEAM currently does not call BIFs at the end of a function in a tail-recursive way. That is, when calling a BIF at the end of a function, the BIF is first called, and then the stack frame is deallocated, and then control is transferred to the caller. If there is no stack frame when a BIF is called in the tail position, the loader will emit a sequence of three instructions: first an instruction that allocates a stack frame and saves the continuation pointer (`allocate`), then an instruction that calls the BIF (`call_bif`), and lastly an instruction that deallocates the stack frame and returns to the caller (`deallocate_return`). The old compiler would essentially allocate a stack frame for each clause in a function, so it would not be that common that a BIF was called in the tail position when there was no stack frame, so the three-instruction sequence was deemed acceptable. The new compiler only allocates stack frames when truly needed, so the three-instruction BIF call sequence has become much more common. This commit introduces a new `call_bif_only` instruction so that only one instruction will be needed when calling a BIF in the tail position when there is no stack frame. This instruction is also used when there is a stack frame to make it possible to deallocate the stack frame before calling the BIF, which may make a subsequent garbage collection at the end of the BIF call cheaper (copying less garbage). The one downside of this change is that the function that called the BIF will not be included in the stack backtrace (similar to how a tail-recursive call to an Erlang function will not be included in the backtrace). That was the quick summary of the commit. Here comes a detailed look at how BIF calls are translated by the loader. The first example is a function that calls `setelement/3` in the tail position: update_no_stackframe(X) -> setelement(5, X, new_value). Here is the BEAM code: {function, update_no_stackframe, 1, 12}. {label,11}. {line,[...]}. {func_info,{atom,t},{atom,update_no_stackframe},1}. {label,12}. {move,{x,0},{x,1}}. {move,{atom,new_value},{x,2}}. {move,{integer,5},{x,0}}. {line,[...]}. {call_ext_only,3,{extfunc,erlang,setelement,3}}. Because there is no stack frame, the `call_ext_only` instruction will be used to call `setelement/3`: {call_ext_only,3,{extfunc,erlang,setelement,3}}. The loader will transform this instruction to a three-instruction sequence: 0000000020BD8130: allocate_tt 0 3 0000000020BD8138: call_bif_e erlang:setelement/3 0000000020BD8148: deallocate_return_Q 0 Using the `call_bif_only` instruction introduced in this commit, only one instruction is needed: 000000005DC377F0: call_bif_only_e erlang:setelement/3 `call_bif_only` calls the BIF and returns to the caller. Now let's look at a function that already has a stack frame when `setelement/3` is called: update_with_stackframe(X) -> foobar(X), setelement(5, X, new_value). Here is the BEAM code: {function, update_with_stackframe, 1, 14}. {label,13}. {line,[...]}. {func_info,{atom,t},{atom,update_with_stackframe},1}. {label,14}. {allocate,1,1}. {move,{x,0},{y,0}}. {line,[...]}. {call,1,{f,16}}. {move,{y,0},{x,1}}. {move,{atom,new_value},{x,2}}. {move,{integer,5},{x,0}}. {line,[...]}. {call_ext_last,3,{extfunc,erlang,setelement,3},1}. Since there is a stack frame, the `call_ext_last` instruction will be used to deallocate the stack frame and call the function: {call_ext_last,3,{extfunc,erlang,setelement,3},1}. Before this commit, the loader would translate this instruction to: 0000000020BD81B8: call_bif_e erlang:setelement/3 0000000020BD81C8: deallocate_return_Q 1 That is, the BIF is called before deallocating the stack frame and returning to the calling function. After this commit, the loader will translate the `call_ext_last` like this: 000000005DC37868: deallocate_Q 1 000000005DC37870: call_bif_only_e erlang:setelement/3 There are still two instructions, but now the stack frame will be deallocated before calling the BIF, which could make the potential garbage collection after the BIF call slightly more efficient (copying less garbage). We could have introduced a `call_bif_last` instruction, but the code for calling a BIF is relatively large and there does not seem be a practical way to share the code between `call_bif` and `call_bif_only` (since the difference is at the end, after the BIF call). Therefore, we did not want to clone the BIF calling code yet another time to make a `call_bif_last` instruction.
2019-03-07	Slightly optimize binary construction	Björn Gustavsson
	Use S operands instead of s operands for a slight speed increase and reduction in code size of process_main(). Use micro instructions for frequently executed instructions. While at it, use safe multiplication in gen_get_integer() in beam_load.c.
2019-03-06	Slightly optimize is_eq and is_ne	Björn Gustavsson

2019-03-06	Eliminate unused i_bs_skip_bits_all2 instruction	Björn Gustavsson
	Starting in OTP 19 (in commit 9504c0dd71d0), the compiler emits a test_unit instruction instead of a skip instruction at the end of binary. We can do the same replacement in the loader to get rid of the i_bs_skip_bits_all2 instruction.
2019-03-06	Reduce code size for binary matching instructions	Björn Gustavsson
	The new compiler required adding support for Y register for all binary matching instructions. That was (intentionally) done in a naive way that simplicated duplicated the entire body of each instruction. Now it's time to be less naive. Rewrite the binary matching instructions using micro instructions. Because some of the binary instructions are huge, that will significantly decrease the size of process_main(). When compiling with clang, a huge process_main() would mess up profile-guide optimization resulting in a significant performance degradation. On my Mac, profile-guide optimzation would decrease the estone benchmark by 100K estones (about 20 percent). This commit gives me back the lost estones.
2019-03-06	Deoptimize obsoleted binary matching instructions	Björn Gustavsson
	Mark the obsoleted instructions bs_start_match2, bs_save2, bs_restore2, and bs_context_to_binary as cold. Remove support of a Y operand for bs_save2 and bs_restore2.
2019-03-06	Reclassify get_tuple_element with a Y destination as hot	Björn Gustavsson
	get_tuple_element with an Y register has become more frequent with the new compiler.
2019-03-06	Remove optimization that has become a pessimization	Björn Gustavsson
	The compiler used to generate "move Literal y(Y)" instructions very rarely. Therefore, there was a transformation to avoid having a "move c y" instruction. With the new compiler, "move Literal y(Y)" instructions are relatively frequent, so we will need a "move c y" instruction.
2019-03-06	Introduce move_window2 and remove move2_par_xyxy	Björn Gustavsson

2019-03-06	Optimize hd/1 and tl/1 in guards	Björn Gustavsson

2019-03-01	Combine is_nonempty_list with get_hd/get_tl	Björn Gustavsson
	The is_nonempty_list test is very frequently followed by get_tl, and frequently followed by get_hd.
2019-03-01	Remove the combined instruction is_nonempty_list_test_heap	Björn Gustavsson
	It turns out that the combination of is_nonempty_list and test_heap is no longer frequent.
2019-03-01	Combine more init instructions	Björn Gustavsson

2019-03-01	Combine test_arity with get_tuple_element	Björn Gustavsson
	The test_arity instruction is often followed by get_tuple_element.
2019-03-01	Combine get_tuple_element when destinations are not consecutive	Björn Gustavsson

2019-02-28	Remove rarely used swap instructions	Björn Gustavsson
	`swap x y` is rarely or never used. I found a single use of `swap_temp x y x` in the sample of modules compiled by `scripts/diffable`.
2019-02-28	Tune move instructions	Björn Gustavsson
	Of the `move_dup` instructions, only `move_dup x x x` was frequently used. Remove the other register combinations. With those instruction `move_dup` instructions removed, it is necessary to add new predicates to avoid unsafe translation to `move_shift` and `move2_par`. Also add additional transformations to transform more `move` instructions into `move2_par`. The existing transformation would require the `move` instructions to be in the "right" order in order to be transformed. Remove `move3 x y x y x y` because it turns out to be rarely executed.
2019-01-21	Optimize the is_function/2 guard test	Björn Gustavsson
	The is_function2 instruction is executed surprisingly frequently when running dialyzer or the compiler. It cannot hurt to optimize it a little.
2019-01-07	Merge pull request #2059 from michalmuskala/mm/bif-microops	Björn Gustavsson
	Use microops for BIFs
2018-12-18	Make length/1 yielding	Björn Gustavsson
	The guard BIF `length/1` would calculate the length of the list in one go without yielding, even if the list was were long. To make it even worse, the call to `length/1` would only cost a single reduction. This commit reimplements `length/1` so that it eats a number of reductions proportional to the length of the list, and yields if the available reductions run out.
2018-12-17	Use microops for BIFs	Michał Muskała
	This allows bif1/2/3 to share the main part of the code. The price is that we always need to copy all three temporary registers when error handling in bodies, but that should be infrequent. Additionally it makes it a bit harder to read the disasembly since now the arguments to BIFs are in the reverse order.
2018-12-13	Simplify GC BIFs	Björn Gustavsson
	Summary: This commit simplifies the implementation of the "GC BIFs" so that they no longer need to do a garbage collection, removing duplicate code for all GC BIFs in the runtime system, as well as potentially reducing the size of the loaded BEAM code by using shorter instructions calling those BIFs. A GC BIF is a guard BIF that will do a garbage collection if it needs to build anything on the heap. For example, `abs/1` is a GC BIF because it might need to allocate space on the heap (if the result is a floating point number or the resulting integer is a bignum). Before R12, a guard BIF (such as `abs/1`) that need to allocate heap space would allocate outside of process's main heap, in a heap fragment. GC BIFs were introduced in R12B to support literals. During garbage collection it become necessary to quickly test whether a term was a literal. To make the check simple, guards BIFs were no longer allowed to create heap fragments. Instead GC BIFs were introduced. In OTP 19, the implementation of literals was changed to support storing messages in heap fragments outside of the main heap for a process. That change again made it allowed for guard BIFs to create heap fragments when they need to build terms on the heap. It would even be possible for the guard BIFs to build directly on the main heap if there is room there, because the compiler assumes that a new `test_heap/2` instruction must be emitted when building anything after calling a GC BIF. (We don't do that in this commit; see below.) This commit simplifies the implementation of the GC BIFs in the runtime system. Each GC BIF had a dual implementation: one that was used when the GC BIF was called directly and one used when it was called via `apply/3`. For example, `abs/1` was implemented in `abs_1()` and `erts_gc_abs_1()`. This commit removes the GC version of each BIF. The other version that allocates heap space using `HAlloc()` is updated to use the new `HeapFragOnlyAlloc()` macro that will allocate heap space in a heap fragment outside of the main heap. Because the BIFs will allocate outside of the main heap, the same `bif` instructions used by nonbuilding BIFs can be used for the (former) GC BIFs. Those instructions don't use the macros that save and restore the heap and stack pointers (SWAPOUT/SWAPIN). If the former GC BIFs would build on the main heap, either new instructions would be needed, or SWAPOUT/SWAPIN instructions would need to be added to the `bif` instructions. Instructions that call the former GC BIFs don't need the operand that specifies the number of live X registers. Therefore, the instructions that call the BIFs are usually one word shorter.
2018-09-28	Rewrite BSM optimizations in the new SSA-based intermediate format	John Högberg
	This commit improves the bit-syntax match optimization pass, leveraging the new SSA intermediate format to perform much more aggressive optimizations. Some highlights: * Watch contexts can be reused even after being passed to a function or being used in a try block. * Sub-binaries are no longer eagerly extracted, making it far easier to keep "happy paths" free from binary creation. * Trivial wrapper functions no longer disable context reuse.
2018-09-24	Support using match contexts from Y registers	John Högberg
	The upcoming beam_ssa_bsm pass allows match contexts to be used across function calls that take said context as an argument, which means it's fairly common for them to end up in Y registers.
2018-09-03	Introduce a put_tuple2 instruction	Björn Gustavsson
	Sometimes when building a tuple, there is no way to avoid an extra `move` instruction. Consider this code: make_tuple(A) -> {ok,A}. The corresponding BEAM code looks like this: {test_heap,3,1}. {put_tuple,2,{x,1}}. {put,{atom,ok}}. {put,{x,0}}. {move,{x,1},{x,0}}. return. To avoid overwriting the source register `{x,0}`, a `move` instruction is necessary. The problem doesn't exist when building a list: %% build_list(A) -> [A]. {test_heap,2,1}. {put_list,{x,0},nil,{x,0}}. return. Introduce a new `put_tuple2` instruction that builds a tuple in a single instruction, so that the `move` instruction can be eliminated: %% make_tuple(A) -> {ok,A}. {test_heap,3,1}. {put_tuple2,{x,0},{list,[{atom,ok},{x,0}]}}. return. Note that the BEAM loader already combines `put_tuple` and `put` instructions into an internal instruction similar to `put_tuple2`. Therefore the introduction of the new instruction will not speed up execution of tuple building itself, but it will be less work for the loader to load the new instruction.
2018-09-03	Merge branch 'maint'	Björn Gustavsson
	* maint: ops.tab: Fix potentially unsafe optimization of raise/2
2018-09-03	ops.tab: Fix potentially unsafe optimization of raise/2	Björn Gustavsson
	The operands for the raise/2 instruction are almost always in x(2) and x(1). Therefore the loader translates the raise/2 instruction to an i_raise/0 instruction which uses the values in x(2) and x(1). If the operands happens to be in other registers, the loader inserts move/2 instruction to move them to x(2) and x(1). The problem is that x(3) is used as a temporary register when generating the move/2 instructions. That is unsafe if the Value operand for raise/2 is x(3). Thus: raise x(0) x(3) will be translated to: move x(0) x(3) move x(3) x(1) move x(3) x(2) i_raise The Trace will be written to both x(2) and x(1). The current compiler will never use x(3) for the Value operand, so there is no need to patch previous releases. But a future compiler version might allocate registers differently.
2018-08-17	Teach binary instructions to use Y registers as destination	Björn Gustavsson
	The new code generator will use Y registers as a destination for binary construction and matching instructions. v3_codegen would always first store terms in an X register and it would be the responsibility of the optimization passes to optimize the extra moves.