Age | Commit message (Collapse) | Author |
|
Summary: This commit simplifies the implementation of the "GC BIFs" so
that they no longer need to do a garbage collection, removing duplicate
code for all GC BIFs in the runtime system, as well as potentially
reducing the size of the loaded BEAM code by using shorter
instructions calling those BIFs.
A GC BIF is a guard BIF that will do a garbage
collection if it needs to build anything on the heap.
For example, `abs/1` is a GC BIF because it might need to
allocate space on the heap (if the result is a floating point
number or the resulting integer is a bignum).
Before R12, a guard BIF (such as `abs/1`) that need to allocate
heap space would allocate outside of process's main heap, in
a heap fragment.
GC BIFs were introduced in R12B to support literals. During garbage
collection it become necessary to quickly test whether a term was
a literal. To make the check simple, guards BIFs were no longer
allowed to create heap fragments. Instead GC BIFs were introduced.
In OTP 19, the implementation of literals was changed to support
storing messages in heap fragments outside of the main heap for a
process. That change again made it allowed for guard BIFs to create
heap fragments when they need to build terms on the heap.
It would even be possible for the guard BIFs to build directly
on the main heap if there is room there, because the compiler
assumes that a new `test_heap/2` instruction must be emitted
when building anything after calling a GC BIF. (We don't do that
in this commit; see below.)
This commit simplifies the implementation of the GC BIFs in
the runtime system.
Each GC BIF had a dual implementation: one that was used when the GC
BIF was called directly and one used when it was called via
`apply/3`. For example, `abs/1` was implemented in `abs_1()` and
`erts_gc_abs_1()`. This commit removes the GC version of each BIF. The
other version that allocates heap space using `HAlloc()` is updated to
use the new `HeapFragOnlyAlloc()` macro that will allocate heap
space in a heap fragment outside of the main heap.
Because the BIFs will allocate outside of the main heap, the same
`bif` instructions used by nonbuilding BIFs can be used for the
(former) GC BIFs. Those instructions don't use the macros that save
and restore the heap and stack pointers (SWAPOUT/SWAPIN). If the
former GC BIFs would build on the main heap, either new instructions
would be needed, or SWAPOUT/SWAPIN instructions would need to be added
to the `bif` instructions.
Instructions that call the former GC BIFs don't need the operand
that specifies the number of live X registers. Therefore, the
instructions that call the BIFs are usually one word shorter.
|
|
|
|
For a long time, there has been the two macros IS_SSMALL() and
MY_IS_SSMALL() that do exactly the same thing.
There should only be one, and it should be called IS_SSMALL().
However, we must decide which implementation to use. When
MY_IS_SSMALL() was introduced a long time ago, it was the most
efficient. In a modern C compiler, there might not be any
difference.
To find out, I used the following small C program to examine
the code generation:
#include <stdio.h>
typedef unsigned int Uint32;
typedef unsigned long Uint64;
typedef long Sint;
#define SWORD_CONSTANT(Const) Const##L
#define SMALL_BITS (64-4)
#define MAX_SMALL ((SWORD_CONSTANT(1) << (SMALL_BITS-1))-1)
#define MIN_SMALL (-(SWORD_CONSTANT(1) << (SMALL_BITS-1)))
#define MY_IS_SSMALL32(x) (((Uint32) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL64(x) (((Uint64) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL(x) (sizeof(x) == sizeof(Uint32) ? MY_IS_SSMALL32(x) : MY_IS_SSMALL64(x))
#define IS_SSMALL(x) (((x) >= MIN_SMALL) && ((x) <= MAX_SMALL))
void original(Sint n)
{
if (IS_SSMALL(n)) {
printf("yes\n");
}
}
void enhanced(Sint n)
{
if (MY_IS_SSMALL(n)) {
printf("yes\n");
}
}
gcc 7.2 produced the following code for the original() function:
.LC0:
.string "yes"
original(long):
movabs rax, 576460752303423488
add rdi, rax
movabs rax, 1152921504606846975
cmp rdi, rax
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang 5.0.0 produced the following code which is slightly better:
original(long):
movabs rax, 576460752303423488
add rax, rdi
shr rax, 60
jne .LBB0_1
mov edi, .Lstr
jmp puts # TAILCALL
.LBB0_1:
ret
.Lstr:
.asciz "yes"
However, in the context of beam_emu.c, clang could produce
similar to what gcc produced.
gcc 7.2 produced the following code when MY_IS_SSMALL() was used:
.LC0:
.string "yes"
enhanced(long):
sar rdi, 59
add rdi, 1
cmp rdi, 1
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang produced similar code.
This code seems to be the cheapest. There are four instructions, and
there is no loading of huge integer constants.
|
|
'0 bsl 134217728' would fail with a system limit exception on a 32-bit
BEAM machine, but not on a 64-bit BEAM machine. Smaller values
on the right would always work.
Make erlang:bsl(0, BigNumber) always return 0 to make for consistency.
(The previous commit accidentally did that change for
'0 bsl BigNumber'.)
|
|
|
|
|
|
|
|
|
|
As a preparation for changing the calling convention for
BIFs, make sure that all BIFs use the macros. Also, eliminate
all calls from one BIF to another, since that also breaks
the calling convention abstraction.
|
|
This is the first step in the implementation of the half-word emulator,
a 64-bit emulator where all pointers to heap data will be stored
in 32-bit words. Code specific for this emulator variant is
conditionally compiled when the HALFWORD_HEAP define has
a non-zero value.
First force all pointers to heap data to fall into a single 32-bit range,
but still store them in 64-bit words.
Temporary term data stored on C stack is moved into scheduler specific
storage (allocated as heaps) and macros are added to make this
happen only in emulators where this is needed. For a vanilla VM the
temporary terms are still stored on the C stack.
|
|
fragments was created. This will mainly benefit NIFs that return
large compound terms.
|
|
|