aboutsummaryrefslogtreecommitdiffstats
path: root/erts/emulator/beam/arith_instrs.tab
AgeCommit message (Collapse)Author
2019-04-08erts: Optimize arithmetic ops using overflow intrinsicsJohn Högberg
2019-03-06Optimize the '*' operator when multiplying two small integersBjörn Gustavsson
2018-12-13Simplify GC BIFsBjörn Gustavsson
Summary: This commit simplifies the implementation of the "GC BIFs" so that they no longer need to do a garbage collection, removing duplicate code for all GC BIFs in the runtime system, as well as potentially reducing the size of the loaded BEAM code by using shorter instructions calling those BIFs. A GC BIF is a guard BIF that will do a garbage collection if it needs to build anything on the heap. For example, `abs/1` is a GC BIF because it might need to allocate space on the heap (if the result is a floating point number or the resulting integer is a bignum). Before R12, a guard BIF (such as `abs/1`) that need to allocate heap space would allocate outside of process's main heap, in a heap fragment. GC BIFs were introduced in R12B to support literals. During garbage collection it become necessary to quickly test whether a term was a literal. To make the check simple, guards BIFs were no longer allowed to create heap fragments. Instead GC BIFs were introduced. In OTP 19, the implementation of literals was changed to support storing messages in heap fragments outside of the main heap for a process. That change again made it allowed for guard BIFs to create heap fragments when they need to build terms on the heap. It would even be possible for the guard BIFs to build directly on the main heap if there is room there, because the compiler assumes that a new `test_heap/2` instruction must be emitted when building anything after calling a GC BIF. (We don't do that in this commit; see below.) This commit simplifies the implementation of the GC BIFs in the runtime system. Each GC BIF had a dual implementation: one that was used when the GC BIF was called directly and one used when it was called via `apply/3`. For example, `abs/1` was implemented in `abs_1()` and `erts_gc_abs_1()`. This commit removes the GC version of each BIF. The other version that allocates heap space using `HAlloc()` is updated to use the new `HeapFragOnlyAlloc()` macro that will allocate heap space in a heap fragment outside of the main heap. Because the BIFs will allocate outside of the main heap, the same `bif` instructions used by nonbuilding BIFs can be used for the (former) GC BIFs. Those instructions don't use the macros that save and restore the heap and stack pointers (SWAPOUT/SWAPIN). If the former GC BIFs would build on the main heap, either new instructions would be needed, or SWAPOUT/SWAPIN instructions would need to be added to the `bif` instructions. Instructions that call the former GC BIFs don't need the operand that specifies the number of live X registers. Therefore, the instructions that call the BIFs are usually one word shorter.
2017-10-01Move out variables from the head of combined instructionsBjörn Gustavsson
Move out from the head the variables that are only used in the excute phase.
2017-10-01Eliminate the OpCode() macroBjörn Gustavsson
Introduce the IsOpCode() macro that can be used to compare instructions.
2017-09-28Eliminate MY_IS_SSMALL()Björn Gustavsson
For a long time, there has been the two macros IS_SSMALL() and MY_IS_SSMALL() that do exactly the same thing. There should only be one, and it should be called IS_SSMALL(). However, we must decide which implementation to use. When MY_IS_SSMALL() was introduced a long time ago, it was the most efficient. In a modern C compiler, there might not be any difference. To find out, I used the following small C program to examine the code generation: #include <stdio.h> typedef unsigned int Uint32; typedef unsigned long Uint64; typedef long Sint; #define SWORD_CONSTANT(Const) Const##L #define SMALL_BITS (64-4) #define MAX_SMALL ((SWORD_CONSTANT(1) << (SMALL_BITS-1))-1) #define MIN_SMALL (-(SWORD_CONSTANT(1) << (SMALL_BITS-1))) #define MY_IS_SSMALL32(x) (((Uint32) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2) #define MY_IS_SSMALL64(x) (((Uint64) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2) #define MY_IS_SSMALL(x) (sizeof(x) == sizeof(Uint32) ? MY_IS_SSMALL32(x) : MY_IS_SSMALL64(x)) #define IS_SSMALL(x) (((x) >= MIN_SMALL) && ((x) <= MAX_SMALL)) void original(Sint n) { if (IS_SSMALL(n)) { printf("yes\n"); } } void enhanced(Sint n) { if (MY_IS_SSMALL(n)) { printf("yes\n"); } } gcc 7.2 produced the following code for the original() function: .LC0: .string "yes" original(long): movabs rax, 576460752303423488 add rdi, rax movabs rax, 1152921504606846975 cmp rdi, rax jbe .L4 rep ret .L4: mov edi, OFFSET FLAT:.LC0 jmp puts clang 5.0.0 produced the following code which is slightly better: original(long): movabs rax, 576460752303423488 add rax, rdi shr rax, 60 jne .LBB0_1 mov edi, .Lstr jmp puts # TAILCALL .LBB0_1: ret .Lstr: .asciz "yes" However, in the context of beam_emu.c, clang could produce similar to what gcc produced. gcc 7.2 produced the following code when MY_IS_SSMALL() was used: .LC0: .string "yes" enhanced(long): sar rdi, 59 add rdi, 1 cmp rdi, 1 jbe .L4 rep ret .L4: mov edi, OFFSET FLAT:.LC0 jmp puts clang produced similar code. This code seems to be the cheapest. There are four instructions, and there is no loading of huge integer constants.
2017-08-31Annotate arithmetic instructions with likely/unlikelyBjörn Gustavsson
We expect that: * An arithmetic instruction is more likely to succeed than to fail. * An arithmetic instruction is more likely to have small operands than bignum operands.
2017-08-23arith_instrs.tab: Clean up bsl/bsrBjörn Gustavsson
Eliminate the all-purpose variable 'ires'. Replace it with several variables used that are used for their specific purpose. Narrow the scope of the variables. Do this to improve readability. It is not expected that it should any impact on performance.
2017-08-23Introduce more packable typesBjörn Gustavsson
The 'I' type can be replaced with 't' if we know that the value will fit in 16 bits. The 'P' type can be replaced with 'Q' if it is used for deallocating a stack frame.
2017-08-23Improve performance for bsl/bsrBjörn Gustavsson
Eliminate the variable used for holding which BIF (bsl or bsr). It seems to improve performance slightly.
2017-08-23arith_instrs.tab: Eliminate warning for uninitialized valueBjörn Gustavsson
Make it clear to GCC that shift_left_count is initialized.
2017-08-11Break out most instructions from beam_emu.cBjörn Gustavsson