otp.git - Mirror of Erlang/OTP repository.

Age	Commit message (Collapse)	Author
2015-08-21	Delay get_tuple_element instructions until they are needed	Björn Gustavsson
	When matching tuples, the pattern matching compiler would generate code that would fetch all elements of the tuple that will ultimately be used, before testing that (for example) the first element is the correct record tag. For example: is_tuple Fail {x,0} test_arity Fail {x,0} 3 get_tuple_element {x,0} 0 {x,1} get_tuple_element {x,0} 1 {x,2} get_tuple_element {x,0} 2 {x,3} is_eq_exact Fail {x,1} some_tag If {x,2} and {x,3} are not used at label Fail, we can re-arrange the code like this: is_tuple Fail {x,0} test_arity Fail {x,0} 3 get_tuple_element {x,0} 0 {x,1} is_eq_exact Fail {x,1} some_tag get_tuple_element {x,0} 1 {x,2} get_tuple_element {x,0} 2 {x,3} Doing that may be beneficial in two ways. If the branch is taken, we have eliminated the execution of two unnecessary instructions. Even if the branch is never or rarely taken, there is the possibility for more optimizations following the is_eq_exact instructions. For example, imagine that the code looks like this: get_tuple_element {x,0} 1 {x,2} get_tuple_element {x,0} 2 {x,3} move {x,2} {y,0} move {x,3} {y,1} Assuming that {x,2} and {x,3} have no further uses in the code that follows, that can be rewritten to: get_tuple_element {x,0} 1 {y,0} get_tuple_element {x,0} 2 {y,1} When should we perform this optimization? At the very latest, it must be done before opt_blocks/1 in beam_block which does the elimination of unnecessary moves. Actually, we want do the optimization before the blocks have been established, since moving instructions out of one block into another is cumbersome. Therefore, we will do the optimization in a new pass that is run before beam_block. A new pass will make debugging easier, and beam_block already has a fair number of sub passes.
2015-08-21	Optimize get_tuple_element instructions by moving them forward	Björn Gustavsson

2015-08-21	beam_block: Improve the move optimizations	Björn Gustavsson
	Here is an example of a move instruction that could not be optimized away because the {x,2} register was not killed: get_tuple_element Reg Pos {x,2} . . . move {x,2} {y,0} put_list {x,2} nil Any We can do the optimization if we replace all occurrences of the {x,2} register as a source with {y,0}: get_tuple_element Reg Pos {y,0} . . . put_list {y,0} nil Dst
2015-08-21	beam_block: Clean up optimization of move optimizations	Björn Gustavsson
	The 'move' optimization was relatively clean until GC BIFs were introduced. Instead of re-thinking the implementation, the existing code was fixed and patched. The current code unsuccessfully attempts to eliminate 'move' instructions across GC BIF and allocation instructions. We can simplify the code if we give up as soon as we encounter any instruction that allocates.
2015-08-21	beam_block: Eliminate redundant wasteful call to opt/1	Björn Gustavsson
	opt_alloc/1 makes a redundant call to opt/1. It is redundant because the opt/1 function has already been applied to the instruction sequence prior to calling opt_alloc/1.
2015-08-06	Teach the compiler the 'da' and 'dz' options	Björn Gustavsson
	Add the 'da' option to create a list after the beam_a pass. Seeing how the code looks after beam_a, but before the blocks have been established, is sometimes useful. For symmetry, add the 'dz' option, even though it is just a synonym for 'S'.
2015-07-07	Merge branch 'bjorn/erts/clang-opt'	Björn Gustavsson
	* bjorn/erts/clang-opt: (27 commits) Improve unpacking performance on x86_64 Slightly tweak the peformance for get_list Speed up list matching Eliminate the variable temp_bits at the top scope of process_main() Eliminate prefetch for conditional instructions Teach beam_makeops to pack operands for move3 and move_window Ensure that the move_call_ext_{last,only} instructions are used beam_makeops: Eliminate unnecessary masking when packing 3 operands Use a cheaper tag scheme for 'd' operands Introduce swap_temp/3 and swap/2 Introduce specialized versions of move2 Add back frequently used x(0) instructions Rewrite the hipe_mode_switch instructions Remove the last use of tmp_arg1 Eliminate use of tmp_arg1 and tmp_arg2 in bit syntax Remove the i_fetch instruction Eliminate use of i_fetch for bit syntax instructions Eliminate the use of i_fetch for BIF instructions Eliminate the use of i_fetch for relational operators Eliminate the use of i_fetch in arithmetic instructions ...
2015-07-06	Improve unpacking performance on x86_64	Björn Gustavsson
	When unpacking operands on 64-bit CPUs, use a smarter mask to help the compiler optimize the code. It turns out that on x86_64, if we use the mask 0xFFFFUL (selecting the 16 least significant bits), the compiler can combine a move and a mask operation into the single insruction 'movzwl', which will eliminate one instruction.
2015-07-06	Slightly tweak the peformance for get_list	Björn Gustavsson
	Fetch the head and tail parts to temporary variables before writing them to their destinations. That should allow the CPU to perform the moves in parallel, which might improve performance.
2015-07-06	Speed up list matching	Björn Gustavsson
	The combination is_non_empty_list followed by get_list is extremly common (but not in estone_SUITE, which is why it has not been noticed before). Therefore it is worthwile to introduce a combined instruction.
2015-07-06	Eliminate the variable temp_bits at the top scope of process_main()	Björn Gustavsson

2015-07-06	Eliminate prefetch for conditional instructions	Björn Gustavsson
	Not pre-fetching in conditional instructions (instructions that use -fail_action) seems to improve performance slightly. The reason for that is that conditional instructions may jump to the failure label, wasting the pre-fetched instruction. Another reason is that that many conditional instructions do a function call, and the register pressure is therefore high. Avoiding the pre-fetch may reduce the register pressure and pontentially result in more efficient code.
2015-07-05	Merge branch 'maint'	Hans Nilsson
	* maint: ssh: be more generous about disconnect expects ssh: add disjunction to ssh_trpt_test_lib:match
2015-07-05	Merge branch 'hans/ssh/no_common_algs/OTP-11531' into maint	Hans Nilsson
	* hans/ssh/no_common_algs/OTP-11531: ssh: be more generous about disconnect expects ssh: add disjunction to ssh_trpt_test_lib:match
2015-07-04	ssh: be more generous about disconnect expects	Hans Nilsson

2015-07-04	ssh: add disjunction to ssh_trpt_test_lib:match	Hans Nilsson

2015-07-03	Teach beam_makeops to pack operands for move3 and move_window	Björn Gustavsson
	It is currently only possible to pack up to 4 operands. However, the move_window4 instrucion has 5 operands and move_window5 and move3 instrucations have 6 operands. Teach beam_makeops to pack instructions with 5 or 6 operands. Also rewrite the move_window instructions in beam_emu.c to macros to allow their operands to get packed.
2015-07-03	Ensure that the move_call_ext_{last,only} instructions are used	Björn Gustavsson
	Update transformations to ensure that the move_call_ext_last and move_call_ext_last are used.
2015-07-03	beam_makeops: Eliminate unnecessary masking when packing 3 operands	Björn Gustavsson
	When packing 3 operands into one word, there would be an unnecessary mask operation when extracting the last operand.
2015-07-03	Use a cheaper tag scheme for 'd' operands	Björn Gustavsson
	Since 'd' operands can only either an X register or an Y register, we only need a single bit to distinguish them. Furthermore, we can pre-multiply the register number with the word size to speed up address calculation.
2015-07-03	Introduce swap_temp/3 and swap/2	Björn Gustavsson
	Sequences of three move instructionst that effectively swap the contents of two registers are fairly common. We can replace them with a swap_temp/3 instruction. The third operand is the temporary register to be used for swapping, since the temporary register may actually be used. If swap_temp/3 instruction is followed by a call, the temporary register will often (but not always) be killed by the call. If it is killed, we can replace the swap_temp/3 instruction with a slightly cheaper swap/2 instruction.
2015-07-03	Introduce specialized versions of move2	Björn Gustavsson
	Currently, move2/2 does the two moves sequentially to ensure that the instruction will always work correctly. We can do better than that. If the two move instructions have any registers in common, we can introduce simpler and slightly more efficient instructions to handle those cases: move_shift/3 move_dup/3 For the remaining cases when the the move instructions have no common registers, the move2/4 instruction can perform the moves in parallel which is probably slightly more efficient. For clarity's sake, we will remain the instruction to move2_par/4.
2015-07-03	Add back frequently used x(0) instructions	Björn Gustavsson

2015-07-03	Rewrite the hipe_mode_switch instructions	Björn Gustavsson
	The 'cmd' variable that were shared by several hipe_mode_switch instructions would cause clang to produce sub-optimal code, probably because it considered the instructions as part of of loop that needed to be optimized. What would was that 'cmd' would be assigned to the ESI register (lower 32 bits of the RSI register). It would use ESI for other purposes in instructions, but at the end of every instruction it would set ESI to 1 just in case the next instruction happened to be hipe_trap_return. This can be seen clearly if this commit is omitted and the define HIPE_MODE_SWITCH_CMD_RETURN in hipe/hipe_mode_switch.h is changed from 1 to some other number such as 42. You will see that 42 is assigned to ESI at the end of every instruction. Eliminate this problem by elimininating the shared 'cmd' variable.
2015-07-03	Remove the last use of tmp_arg1	Björn Gustavsson

2015-07-03	Eliminate use of tmp_arg1 and tmp_arg2 in bit syntax	Björn Gustavsson

2015-07-03	Remove the i_fetch instruction	Björn Gustavsson

2015-07-03	Eliminate use of i_fetch for bit syntax instructions	Björn Gustavsson

2015-07-03	Eliminate the use of i_fetch for BIF instructions	Björn Gustavsson

2015-07-03	Eliminate the use of i_fetch for relational operators	Björn Gustavsson

2015-07-03	Eliminate the use of i_fetch in arithmetic instructions	Björn Gustavsson
	The i_fetch instruction fetches two operands and places them in the tmp_arg1 and tmp_arg2 variables. The next instruction (such as i_plus) does not have to handle different types of operands, but can get get them simply from the tmp_arg* variables. Thus, i_fetch was introduced as a way to temper a potentail combinatorial explosion. Unfortunately, clang will generate terrible code because of the tmp_arg1 and tmp_arg2 variables being live across multiple instructions. Note that Clang has no way to predict the control flow from one instruction to another. Clang must assume that any instruction can jump to any other instruction. Somehow GCC manages to cope with this situation much better. Therefore, to improve the quality of the code generated by clang, we must eliminate all uses of the tmp_arg1 and tmp_arg2 variables. This commit eliminates the use of i_fetch in combination with the arithmetic and logical instructions. While we are touching the code for the bsr and bsl instructions, also move the tmp_big[] array from top scope of process main into the block that encloses the bsr and bsl instructions.
2015-07-03	Make the 'r' operand type optional	Björn Gustavsson
	The 'r' type is now mandatory. That means in order to handle both of the following instructions: move x(0) y(7) move x(1) y(7) we would need to define two specific operations in ops.tab: move r y move x y We want to make 'r' operands optional. That is, if we have only this specific instruction: move x y it will match both of the following instructions: move x(0) y(7) move x(1) y(7) Make 'r' optional allows us to save code space when we don't want to make handling of x(0) a special case, but we can still use 'r' to optimize commonly used instructions.
2015-07-03	Allow X and Y registers to be overloaded with any literal	Björn Gustavsson
	Consider the try_case_end instruction: try_case_end s The 's' operand type means that the operand can either be a literal of one of the types atom, integer, or empty list, or a register. That worked well before R12. In R12 additional types of literals where introduced. Because of way the overloading was done, an 's' operand cannot handle the new types of literals. Therefore, code such as the following is necessary in ops.tab to avoid giving an 's' operand a literal: try_case_end Literal=q => move Literal x \| try_case_end x While this work, it is error-prone in that it is easy to forget to add that kind of rule. It would also be complicated in case we wanted to introduce a new kind of addition operator such as: i_plus jssd Since there are two 's' operands, two scratch registers and two 'move' instructions would be needed. Therefore, we'll need to find a smarter way to find tag register operands. We will overload the pid and port tags for X and Y register, respectively. That works because pids and port are immediate values (fit in one word), and there are no literals for pids and ports.
2015-07-03	Eliminate R_REG_DEF	Björn Gustavsson

2015-07-03	Store r(0) and x(0) in the same location	Björn Gustavsson
	As part of improving code generation for clang, we want to eliminate the special variable that stores the content of X register zero most of the time. In a future, that will allow us to eliminate the special case of handling r(0) for most instructions, thus reducing the code size and allow other simplifcations. Therefore, in this commit, eliminate the variable that is used to store r(0) and make r(0) as synonym for x(0). I have chosen to keep the r(0) define to keep the size of the diff managable.
2015-07-03	Change the meaning of 'x' in a transformation	Björn Gustavsson
	The purpose of this series of commits is to improve code generation for the Clang compiler. As a first step we want to change the meaning of 'x' in a transformation such as: operation Literal=q => move Literal x \| operation x Currently, a plain 'x' means reg[0] or x(0), which is the first element in the X register array. That element is distinct from r(0) which is a variable in process_main(). Therefore, since r(0) and x(0) are currently distinct it is fine to use x(0) as a scratch register. However, in the next commit we will eliminate the separate variable for storing the contents of X register zero (thus, x(0) and r(0) will point to the same location in the X register array). Therefore, we must use another scratch register in transformation. Redefine a plain 'x' in a transformation to mean x(1023). Also define SCRATCH_X_REG so that we can refer to the register by name from C code.
2015-07-03	beam_emu.c: Remove unused MoveGenDest macro	Björn Gustavsson

2015-07-03	beam_makeops: Eliminate crash because of unsafe packing	Björn Gustavsson
	Consider an hypothetical instruction: do_something x x c The loader would crash if we tried to load an instance of the instruction with the last operand referencing a literal: {do_something,{x,0},{x,1},{literal,{a,b,c}}} Teach beam_makeops to turn off packing for such unsafe instructions.
2015-07-03	Merge branch 'maint'	Björn Gustavsson
	* maint: Add a smoke test of erts_debug:df/1 Correct disassembly of the i_get_map_elements instruction
2015-07-03	Merge branch 'bjorn/erts/beam_debug' into maint	Björn Gustavsson
	* bjorn/erts/beam_debug: Add a smoke test of erts_debug:df/1 Correct disassembly of the i_get_map_elements instruction
2015-07-03	Merge branch 'maint'	Björn-Egil Dahlberg

2015-07-03	Merge branch 'egil/fix-configure-pthread_getname/OTP-12887' into maint	Björn-Egil Dahlberg
	* egil/fix-configure-pthread_getname/OTP-12887: erts: Fix configure pthread_getname
2015-07-03	Merge branch 'egil/update-gitgnore' into maint	Björn-Egil Dahlberg

2015-07-03	mnesia: Update gitignore	Björn-Egil Dahlberg

2015-07-03	Merge branch 'maint'	Hans Nilsson
	* maint: ssh: testcases for no common algorithms in key exchange
2015-07-03	Merge branch 'hans/ssh/no_common_algs/OTP-11531' into maint	Hans Nilsson
	* hans/ssh/no_common_algs/OTP-11531: ssh: testcases for no common algorithms in key exchange
2015-07-03	Merge branch 'maint'	Hans Nilsson
	* maint: ssh: Initial ssh_tprt_test_lib.erl and ssh_protocol_SUITE
2015-07-03	Merge branch 'hans/ssh/trpt_test_lib/OTP-12858' into maint	Hans Nilsson

2015-07-03	Merge branch 'maint'	Ingela Anderton Andin

2015-07-03	Merge branch 'ia/ssl/tune-tests' into maint	Ingela Anderton Andin
	* ia/ssl/tune-tests: ssl: Exclude broken OpenSSL version from ECC test ssl: Tune timeouts