Age | Commit message (Collapse) | Author |
|
Before 2d2e78ad6e66 that introduced tail-recursive calls of BIFs, the
stack was guaranteed not to be empty when `erlang:raise/3` was called
from the `catch` block of a `try` (because the `try` had set up a
stack frame that would be deallocated after the `raise` call).
Now the stack can be empty, so the ASSERT() call in next_catch()
that checks that there is a continuation pointer at the top of
the stack may fail.
Move the ASSERT() call to after check for empty stack.
While at it, also add a comment of the reason for the assertion.
|
|
|
|
|
|
OTP-15610
|
|
The is_function2 instruction is executed surprisingly
frequently when running dialyzer or the compiler. It
cannot hurt to optimize it a little.
|
|
Summary: This commit simplifies the implementation of the "GC BIFs" so
that they no longer need to do a garbage collection, removing duplicate
code for all GC BIFs in the runtime system, as well as potentially
reducing the size of the loaded BEAM code by using shorter
instructions calling those BIFs.
A GC BIF is a guard BIF that will do a garbage
collection if it needs to build anything on the heap.
For example, `abs/1` is a GC BIF because it might need to
allocate space on the heap (if the result is a floating point
number or the resulting integer is a bignum).
Before R12, a guard BIF (such as `abs/1`) that need to allocate
heap space would allocate outside of process's main heap, in
a heap fragment.
GC BIFs were introduced in R12B to support literals. During garbage
collection it become necessary to quickly test whether a term was
a literal. To make the check simple, guards BIFs were no longer
allowed to create heap fragments. Instead GC BIFs were introduced.
In OTP 19, the implementation of literals was changed to support
storing messages in heap fragments outside of the main heap for a
process. That change again made it allowed for guard BIFs to create
heap fragments when they need to build terms on the heap.
It would even be possible for the guard BIFs to build directly
on the main heap if there is room there, because the compiler
assumes that a new `test_heap/2` instruction must be emitted
when building anything after calling a GC BIF. (We don't do that
in this commit; see below.)
This commit simplifies the implementation of the GC BIFs in
the runtime system.
Each GC BIF had a dual implementation: one that was used when the GC
BIF was called directly and one used when it was called via
`apply/3`. For example, `abs/1` was implemented in `abs_1()` and
`erts_gc_abs_1()`. This commit removes the GC version of each BIF. The
other version that allocates heap space using `HAlloc()` is updated to
use the new `HeapFragOnlyAlloc()` macro that will allocate heap
space in a heap fragment outside of the main heap.
Because the BIFs will allocate outside of the main heap, the same
`bif` instructions used by nonbuilding BIFs can be used for the
(former) GC BIFs. Those instructions don't use the macros that save
and restore the heap and stack pointers (SWAPOUT/SWAPIN). If the
former GC BIFs would build on the main heap, either new instructions
would be needed, or SWAPOUT/SWAPIN instructions would need to be added
to the `bif` instructions.
Instructions that call the former GC BIFs don't need the operand
that specifies the number of live X registers. Therefore, the
instructions that call the BIFs are usually one word shorter.
|
|
* maint:
Allow disabling retpoline in interpreter loop
Add a ./configure flag for spectre mitigation
|
|
into maint
* john/erts/spectre-configure-flag-otp_20/OTP-15430/ERIERL-237:
Allow disabling retpoline in interpreter loop
Add a ./configure flag for spectre mitigation
|
|
john/erts/spectre-configure-flag-otp_20/OTP-15430/ERIERL-237
* john/erts/spectre-configure-flag/OTP-15430/ERIERL-237:
Allow disabling retpoline in interpreter loop
Add a ./configure flag for spectre mitigation
|
|
We only do this when the user has explicitly told us it's okay to
partially disable mitigation (spectre-mitigation=incomplete). The
macro is inert if it isn't.
|
|
This patch optimizes map operations to not allocate new maps
when the key is being replaced by the exact same value in memory.
Imagine this very common idiom:
Map#{key := compute_new_value(Value, Condition)}
where:
compute_new_x(X, true) -> X + 1;
compute_new_x(X, false) -> X;
In many cases, we are not changing the value in `Key`, however
the code prior to this patch would still allocate a new array
for the map values. This optimization changes this.
The cost of optimization is minimum, as in the worst case scenario
it only adds a pointer comparison and boolean check. The major benefit
is reducing the GC pressure by avoiding allocating data.
Next we list the operations we have changed alongside the benchmark
results. The benchmarks basically create a map and perform the same
operations, roughly 20000 times, once replacing the key with the same
value, and another with a different value.
* Map#{Key := Value}
For a map with 4 keys, replacing the fourth key 20000 times went from
718us to 539us.
For a map with 8 keys, replacing the fourth key 20000 times went from
976us to 555us.
* maps:update/3
For a map with 4 keys, replacing the fourth key 20000 times went from
673us to 575us.
For a map with 8 keys, replacing the fourth key 20000 times went from
827us to 585us.
* maps:put/3
For a map with 4 keys, replacing the fourth key 20000 times went from
763us to 553us.
For a map with 8 keys, replacing the fourth key 20000 times went from
788us to 561us.
Note that we have ported some optimizations found in maps:update/3
to maps:put/3 while creating this patch.
|
|
|
|
|
|
Communication between Erlang processes has conceptually always been
performed through asynchronous signaling. The runtime system
implementation has however previously preformed most operation
synchronously. In a system with only one true thread of execution, this
is not problematic (often the opposite). In a system with multiple threads
of execution (as current runtime system implementation with SMP support)
it becomes problematic. This since it often involves locking of structures
when updating them which in turn cause resource contention. Utilizing
true asynchronous communication often avoids these resource contention
issues.
The case that triggered this change was contention on the link lock due
to frequent updates of the monitor trees during communication with a
frequently used server. The signal order delivery guarantees of the
language makes it hard to change the implementation of only some signals
to use true asynchronous signaling. Therefore the implementations
of (almost) all signals have been changed.
Currently the following signals have been implemented as true
asynchronous signals:
- Message signals
- Exit signals
- Monitor signals
- Demonitor signals
- Monitor triggered signals (DOWN, CHANGE, etc)
- Link signals
- Unlink signals
- Group leader signals
All of the above already defined as asynchronous signals in the
language. The implementation of messages signals was quite
asynchronous to begin with, but had quite strict delivery constraints
due to the ordering guarantees of signals between a pair of processes.
The previously used message queue partitioned into two halves has been
replaced by a more general signal queue partitioned into three parts
that service all kinds of signals. More details regarding the signal
queue can be found in comments in the erl_proc_sig_queue.h file.
The monitor and link implementations have also been completely replaced
in order to fit the new asynchronous signaling implementation as good
as possible. More details regarding the new monitor and link
implementations can be found in the erl_monitor_link.h file.
|
|
|
|
When a ref is created before performing a receive that will only
receive message containing that ref, there is a compiler optimization
to avoid scanning messages that can't possible contain the newly
created ref.
Magnus Lång pointed out that the implementation of the optimization
is flawed. Exceptions or recursive calls could cause the receive
operation to scan the receive queue from a position beyond the expected
message (that is, the message containing the ref would never be
matched out). See the receive_opt_exception/1 and receive_opt_recursion/1
test cases in receive_SUITE.
It turns out that we can simplify the implementation of the
optimization while fixing the bug (suggested by Magnus Lång). We
actually don't need the c_p->msg.mark field. It is enough to have
c_p->msg.saved_pos; if it is non-zero, it is a valid position in the
message qeueue. All we need to do is to ensure that we clear
c_p->msg.saved_pos when a receive is exited normally or abnormally.
We can clear c_p->msg.saved_pos in JOIN_MESSAGE(), since it is called
both when leaving a receive because a message matched and because there
was a timeout and the 'after' clause was executed. In addition, we
need to clear c_p->msg.saved_pos when an exception is caught.
https://bugs.erlang.org/browse/ERL-511
|
|
In beam_emu, use "erts_gc_" for any function that does garbage
collection, as preparation for adding more sanity checks.
|
|
erlang:is_builtin(erlang, M, F) returns false for apply/2 and
yield/0. The documentation for erlang:is_builtin/3 says that it
returns true for BIFs that are implemented in C. apply/2 and
yield/0 are implemented in C (as BEAM instructions), and
therefore the correct return value is true.
Also see a similar argument that was made for apply/3 in the past:
http://erlang.org/pipermail/erlang-bugs/2015-October/005101.html
https://bugs.erlang.org/browse/ERL-500
|
|
|
|
On 64-bit machines where the C code is always at address below 4Gb,
pack one or more operands into the instruction word.
|
|
On a 64-bit machine, we only need 32 bits to store a pointer to
the C code that implements a BEAM instruction. Refactor the code
to only use the lower 32 bits of each instruction word, and take
care to preserve the high 32 bits.
|
|
Add the CHECK_ALIGNED() macro that can be used for testing that
the storage destination is word-aligned.
|
|
process_main() is already too big.
|
|
Introduce the IsOpCode() macro that can be used to compare
instructions.
|
|
Consider the types in the code below:
BeamInstr* I;
.
.
.
BeamInstr* next;
next = (BeamInstr *) *I;
Goto(next);
This is illogical. If 'I' points to a BeamInstr, then 'next' should
be a BeamInstr, not a pointer to a BeamInstr. The Goto() macros does
not require a pointer, because it will cast its argument to a void*
anyway.
Therefore, this code example can be simplified to:
BeamInstr* I;
.
.
.
BeamInstr next;
next = *I;
Goto(next);
Similarly, we can remove the casts in the macros when NO_JUMP_TABLE
is defined.
|
|
The BeamOp() macro in erl_vm.h is clumsy to use. All users
cast the return value to BeamInstr.
Define new macros that are easier to use. In the future,
we might want to pack an operand into the same word as
the pointer to the instruction, so we will define two macros.
BeamIsOpCode() is used to rewrite code like this:
if (Instr == (BeamInstr) BeamOp(op_i_func_info_IaaI) {
...
}
to:
if (BeamIsOpCode(Instr, op_i_func_info_IaaI)) {
...
}
BeamOpCodeAddr(op_apply_bif) is used when we need the address
for an instruction.
Also elimiminate the global variables em_* in beam_emu.c.
They are not really needed. Use the BeamOpCodeAddr() macro
instead.
|
|
The inconsistent order has annoyed me for a long time.
While at it, also remove the unecessary definition of LabelAddr() if
NO_JUMP_TABLE is defined.
|
|
* lukas/erts/remove-dirty-scheduler-defines/OTP-14613:
erts: Remove possibility to disable dirty schedulers
|
|
|
|
|
|
|
|
Besides being noisy, they were already defined by a global Unix-
specific header, causing the Windows build to fail if one forgot to
define them.
|
|
|
|
We don't need to pass x(0), x(1), and x(2) because they
can already be found in the register array.
|
|
We don't need to pass x(0), x(1), and x(2) because they
can already be found in the register array.
|
|
Keep frequently used variables in machine registers.
|
|
The bit syntax instructions are mixed among other instructions
in beam_hot.h and beam_cold.h.
Introduce a new hotness level called '%warm' with is associated
file beam_warm.h. Mark all bit syntax instructions as '%warm'.
|
|
The beam_instrs.h file serves no useful purpose. Put the
instructions in beam_hot.h instead.
|
|
The type 'd' could be used both for destination registers and
source register.
Restrict the 'd' type to only be used for destinations, and
introduce the new 'S' type to be used when a source must be
a register.
|
|
The packer had several bugs and limitations. For instance, on
a 32-bit Erlang virtual machine it would gladly pack three
't' values into one word even though it would be not safe.
The rewritten version will be more careful how much it packs
into each word. It will also be able to do packing for more
instructions.
|
|
Having the helper functions for map update knowing all the details
of operands for the instruction will make it difficult to make
improvements such as better packing.
|
|
|
|
The instruction put_map_assoc/5 (used for updating a map) has a
failure operand, but it can't actually fail provided that its "map"
argument is a map.
The following code:
M#{key=>value}.
will be compiled to:
{test,is_map,{f,3},[{x,0}]}.
{line,[...]}.
{put_map_assoc,{f,0},{x,0},{x,0},1,{list,[{atom,key},{atom,value}]}}.
return.
{label,3}.
%% Code that produces a 'badmap' exception follows.
Because of the is_map instruction, {x,0} always contains a map when
the put_map_assoc instruction is executed. Therefore we can remove
the failure operand. That will save one word, and also eliminate
two tests at run-time.
The only problem is that the compiler in OTP 17 did not emit a
is_map instruction before the put_map_assoc instruction. Therefore,
we must add an instruction that tests for a map if the code was
compiled with the OTP 17 compiler.
Unfortunately, there is no safe and relatively easy way to known that
the OTP 17 compiler was used, so we will check whether a compiler
before OTP 20 was used. OTP 20 introduced a new chunk type for atoms,
which is trivial to check.
|
|
|
|
Eliminate the need to write pre-processor macros for each instruction.
Instead allow the implementation of instruction to be written in
C directly in the .tab files. Rewrite all existing macros in this
way and remove the %macro directive.
|
|
|
|
This refactor was done using the unifdef tool like this:
for file in $(find erts/ -name *.[ch]); do unifdef -t -f defile -o $file $file; done
where defile contained:
#define ERTS_SMP 1
#define USE_THREADS 1
#define DDLL_SMP 1
#define ERTS_HAVE_SMP_EMU 1
#define SMP 1
#define ERL_BITS_REENTRANT 1
#define ERTS_USE_ASYNC_READY_Q 1
#define FDBLOCK 1
#undef ERTS_POLL_NEED_ASYNC_INTERRUPT_SUPPORT
#define ERTS_POLL_ASYNC_INTERRUPT_SUPPORT 0
#define ERTS_POLL_USE_WAKEUP_PIPE 1
#define ERTS_POLL_USE_UPDATE_REQUESTS_QUEUE 1
#undef ERTS_HAVE_PLAIN_EMU
#undef ERTS_SIGNAL_STATE
|
|
|
|
Add stacktrace entries to BIF calls from emulator
|
|
The goal of this change is to improve debugging of
emulator calls. For example, the following code
rem(1, y)
will error with atom `badarith` when y is 0 and the
stacktrace has no entry for `erlang:rem/2`, making
such cases very hard to debug. This patch makes it
so the stacktrace includes `erlang:rem(1, 0)`.
The following emulator BIFs have been changed:
* band/2
* bnot/1
* bor/2
* bsl/2
* bsr/2
* bxor/2
* div/2
* element/2
* int_div/2
* rem/2
* sminus/2
* splus/2
* stimes/2
|