Age | Commit message (Collapse) | Author |
|
The BeamOp() macro in erl_vm.h is clumsy to use. All users
cast the return value to BeamInstr.
Define new macros that are easier to use. In the future,
we might want to pack an operand into the same word as
the pointer to the instruction, so we will define two macros.
BeamIsOpCode() is used to rewrite code like this:
if (Instr == (BeamInstr) BeamOp(op_i_func_info_IaaI) {
...
}
to:
if (BeamIsOpCode(Instr, op_i_func_info_IaaI)) {
...
}
BeamOpCodeAddr(op_apply_bif) is used when we need the address
for an instruction.
Also elimiminate the global variables em_* in beam_emu.c.
They are not really needed. Use the BeamOpCodeAddr() macro
instead.
|
|
The inconsistent order has annoyed me for a long time.
While at it, also remove the unecessary definition of LabelAddr() if
NO_JUMP_TABLE is defined.
|
|
For a long time, there has been the two macros IS_SSMALL() and
MY_IS_SSMALL() that do exactly the same thing.
There should only be one, and it should be called IS_SSMALL().
However, we must decide which implementation to use. When
MY_IS_SSMALL() was introduced a long time ago, it was the most
efficient. In a modern C compiler, there might not be any
difference.
To find out, I used the following small C program to examine
the code generation:
#include <stdio.h>
typedef unsigned int Uint32;
typedef unsigned long Uint64;
typedef long Sint;
#define SWORD_CONSTANT(Const) Const##L
#define SMALL_BITS (64-4)
#define MAX_SMALL ((SWORD_CONSTANT(1) << (SMALL_BITS-1))-1)
#define MIN_SMALL (-(SWORD_CONSTANT(1) << (SMALL_BITS-1)))
#define MY_IS_SSMALL32(x) (((Uint32) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL64(x) (((Uint64) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL(x) (sizeof(x) == sizeof(Uint32) ? MY_IS_SSMALL32(x) : MY_IS_SSMALL64(x))
#define IS_SSMALL(x) (((x) >= MIN_SMALL) && ((x) <= MAX_SMALL))
void original(Sint n)
{
if (IS_SSMALL(n)) {
printf("yes\n");
}
}
void enhanced(Sint n)
{
if (MY_IS_SSMALL(n)) {
printf("yes\n");
}
}
gcc 7.2 produced the following code for the original() function:
.LC0:
.string "yes"
original(long):
movabs rax, 576460752303423488
add rdi, rax
movabs rax, 1152921504606846975
cmp rdi, rax
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang 5.0.0 produced the following code which is slightly better:
original(long):
movabs rax, 576460752303423488
add rax, rdi
shr rax, 60
jne .LBB0_1
mov edi, .Lstr
jmp puts # TAILCALL
.LBB0_1:
ret
.Lstr:
.asciz "yes"
However, in the context of beam_emu.c, clang could produce
similar to what gcc produced.
gcc 7.2 produced the following code when MY_IS_SSMALL() was used:
.LC0:
.string "yes"
enhanced(long):
sar rdi, 59
add rdi, 1
cmp rdi, 1
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang produced similar code.
This code seems to be the cheapest. There are four instructions, and
there is no loading of huge integer constants.
|
|
* lukas/erts/fix_threads_error_printout:
erts: Print the error reason when threads fail to start
|
|
|
|
The byte_offset of sub-binaries wasn't taken into account for
ProcBins, subtly ruining the results. The test suite didn't catch
it since it didn't check for sub-binaries in particular, and only
checked for equality between variations -- not whether the output
was equal to the input.
|
|
|
|
|
|
* bjorn/erts/pack-combined:
Pack combined instructions
beam_makeops: Refactor code generation
Correct disassembly of select instructions
|
|
* lukas/erts/remove-dirty-scheduler-defines/OTP-14613:
erts: Remove possibility to disable dirty schedulers
|
|
Make sure to use the 'unpacked[-1]' when accessing the unpacked
arguments.
|
|
* bjorn/erts/relative-jumps:
Pack failure labels in i_select_val2 and i_select_tuple_arity2
Optimize i_select_tuple_arity2 and is_select_lins
Rewrite select_val_bins so that its labels can be packed
Pack sequences of trailing 'f' operands
Implement packing of 'f' and 'j'
Make sure that mask literals are 64 bits
Use relative failure labels
Add information about offset to common group start position
Remove JUMP_OFFSET
Refactor instructions to support relative jumps
Introduce a new trace_jump/1 instruction for tracing
Avoid using $Src more than once
|
|
|
|
Don't save a pointer to the default failure label. That could
relieve register pressure. Instead, if we'll need to signal an
error in i_select_tuple_arity2, delay to the execution phase.
That should be a clear win because i_select_tuple_arity2 very
rarely fails because of the term being selected is not a tuple.
|
|
|
|
Pack sequences of trailing 'f' operands for instructions
such at jump_on_val or i_select_val_lins.
|
|
|
|
Relative failure in itself is not an optimization, but we plan to
pack failure labels in the future to save memory.
|
|
|
|
It has served its purpose.
|
|
sverker/on_load-on_load-bug/OTP-14612
|
|
Symptom: VM crash when erlang:trace_pattern(on_load, ..) is set and
module with -on_load is loaded.
Problem: Tracing and -on_load clash in their use of Export.beam[1]
Solution: Do not do call set_default_trace_pattern in finish_loading_1
for modules with -on_load. finish_after_on_load_2 will do that anyway.
|
|
Introduce new macros that can be used for relative jumps and
use them consistently.
Test that everything works by using a non-zero constant JUMP_OFFSET.
The loader subtracts JUMP_OFFSET from loaded labels, and all
instructions that use 'f' operands add it back.
|
|
* bjorn/erts/improve-beam-ops:
Add built-in macros $ARG_POSITION() and $IS_PACKED()
Use 't' instead of 'I' bit syntax operands
Optimize operand type for match context in i_bs_get_integer
Change operand type from 's' to 'S' for a few instructions
Use the correct name of the parameter
|
|
Optimise equality comparisons
|
|
* potatosalad/erts/binary_find_bif_improved/PR-1480/OTP-14610:
stdlib: Improved BIF for binary matches and split.
|
|
|
|
* sverker/ets-fix-assert-fix:
erts: Fix faulty ASSERT of table fixation counter
|
|
|
|
* sverker/valgrind-fixes/OTP-14609:
erts: Suppress false memory leak for dlerror
[ct] Cleanup and rename purify related functions as valgrind
Revert "remove unused purify functions"
erts: Fix memory leak when sending to terminating port
erts: Fix harmless use of uninitialised value
|
|
The number of live registers, unit and flags, and the number
of slots all fit comortably in 16 bits. This change will give
more opportunities for packing.
|
|
The match context is always in an X register. Change the 's' operand
to 'x'.
While we are it, also change the operands for Live and FlagsAndUnit
to 't' (they will both fit comfortably in 16 bits).
|
|
'S' is slightly more efficient. Using 'S' may also enable more
packing.
|
|
|
|
As a preparation for introducing relative jumps, introduce
"trace_jump W" that can be used for tracing. This instruction
will continue to have an absolute address for the jump target.
(Note: This instruction is never created during loading; it
is only created in stubs when tracing is active.)
|
|
The C compiler will probably optimize this, but just to be sure...
|
|
Conflicts:
erts/emulator/sys/unix/sys.c
|
|
* lukas/erts/fix_warnings:
Fix some clang warnings
Fix unused-functions warnings
|
|
|
|
|
|
|
|
The 'S' operand type was added in 5bf73db9fd77.
|
|
|
|
* bjorn/erts/improve-beam-ops:
Don't allow macros to assign to hard-coded variables
Annotate try_case_end as cold
beam_emu.c: Mark initialization code as unlikely
Try to avoid generating unecessary do/while wrappers
Eliminate unecessary instruction macro i_move_call_only()
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
Besides being noisy, they were already defined by a global Unix-
specific header, causing the Windows build to fail if one forgot to
define them.
|
|
|
|
|
|
OTP-14520
|
|
OTP-14520
|