Age | Commit message (Collapse) | Author |
|
This reverts commit f4c121b1d98bf3db7e6eecbb9fb5b292f2bc3bb0.
|
|
|
|
* maint:
Implement a tab for persistent terms in crashdump viewer
Add tests of persistent terms for crashdump_viewer
Add a persistent term storage
Refactor releasing of literals
Extend the sharing-preserving routines to optionally copy literals
Conflicts:
erts/emulator/Makefile.in
erts/emulator/beam/erl_process_dump.c
erts/preloaded/ebin/erts_internal.beam
erts/preloaded/ebin/init.beam
lib/sasl/src/systools_make.erl
|
|
Persistent terms are useful for storing Erlang terms that are never
or infrequently updated. They have the following advantages:
* Constant time access. A persistent term is not copied when it is
looked up. The constant factor is lower than for ETS, and no locks
are taken when looking up a term.
* Persistent terms are not copied in garbage collections.
* There is only ever one copy of a persistent term (until it is
deleted). That makes them useful for storing configuration data
that needs to be easily accessible by all processes.
Persistent terms have the following drawbacks:
* Updates are expensive. The hash table holding the keys for the
persistent terms are updated whenever a persistent term is added,
updated or deleted.
* Updating or deleting a persistent term triggers a "global GC", which
will schedule a heap scan of all processes to search the heap of all
processes for the deleted term. If a process still holds a reference
to the deleted term, the process will be garbage collected and the
term copied to the heap of the process. This global GC can make the
system less responsive for some time.
Three BIFs (implemented in C in the emulator) is the entire
interface to the persistent term functionality:
* put(Key, Value) to store a persistent term.
* get(Key) to look up a persistent term.
* erase(Key) to delete a persistent term.
There are also two additional BIFs to obtain information about
persistent terms:
* info() to return a map with information about persistent terms.
* get() to return a list of a {Key,Value} tuples for all persistent
terms. (The values are not copied.)
|
|
We no longer lock more than one base node at a time.
We do however trylock a second base node at join.
|
|
for different lock instances.
|
|
|
|
|
|
|
|
Lock order is reverse tree depth, from leafs toward root.
This solution may eventually fail if running too long
as route nodes do not increase their 'lc_order' in a join operation
when they move up in the tree.
But who runs a VM with lock-checker for such a long time?
|
|
Lock order is key term order, so each base node needs its own key
if lock check is enabled.
|
|
|
|
The current ETS ordered_set implementation can quickly become a
scalability bottleneck on multicore machines when an application updates
an ordered_set table from concurrent processes [1][2]. The current
implementation is based on an AVL tree protected from concurrent writes
by a single readers-writer lock. Furthermore, the current implementation
has an optimization, called the stack optimization [3], that can improve
the performance when only a single process accesses a table but can
cause bad scalability even in read-only scenarios. It is possible to
pass the option {write_concurrency, true} to ets:new/2 when creating an
ETS table of type ordered_set but this option has no effect for tables
of type ordered_set without this commit. The new ETS ordered_set
implementation, added by this commit, is only activated when one passes
the options ordered_set and {write_concurrency, true} to the ets:new/2
function. Thus, the previous ordered_set implementation (from here on
called the default implementation) can still be used in applications
that do not benefit from the new implementation. The benchmark results
on the following web page show that the new implementation is many times
faster than the old implementation in some scenarios and that the old
implementation is still better than the new implementation in some
scenarios.
http://winsh.me/ets_catree_benchmark/ets_ca_tree_benchmark_results.html
The new implementation is expected to scale better than the default
implementation when concurrent processes use the following ETS
operations to operate on a table:
delete/2, delete_object/2, first/1, insert/2 (single object),
insert_new/2 (single object), lookup/2, lookup_element/2, member/2,
next/2, take/2 and update_element/3 (single object).
Currently, the new implementation does not have scalable support for the
other operations (e.g., select/2). However, when these operations are
used infrequently, the new implantation may still scale better than the
default implementation as the benchmark results at the URL above shows.
Description of the New Implementation
----------------------------------
The new implementation is based on a data structure which is called the
contention adapting search tree (CA tree for short). The following
publication contains a detailed description of the CA tree:
A Contention Adapting Approach to Concurrent Ordered Sets
Journal of Parallel and Distributed Computing, 2018
Kjell Winblad and Konstantinos Sagonas
https://doi.org/10.1016/j.jpdc.2017.11.007
http://www.it.uu.se/research/group/languages/software/ca_tree/catree_proofs.pdf
A discussion of how the CA tree can be used as an ETS back-end can be
found in another publication [1]. The CA tree is a data structure that
dynamically changes its synchronization granularity based on detected
contention. Internally, the CA tree uses instances of a sequential data
structure to store items. The CA tree implementation contained in this
commit uses the same AVL tree implementation as is used for the default
ordered set implementation. This AVL tree implementation is reused so
that much of the existing code to implement the ETS operations can be
reused.
Tests
-----
The ETS tests in `lib/stdlib/test/ets_SUITE.erl` have been extended to
also test the new ordered_set implementation. The function
ets_SUITE:throughput_benchmark/0 has also been added to this file. This
function can be used to measure and compare the performance of the
different ETS table types and options. This function writes benchmark
data to standard output that can be visualized by the HTML page
`lib/stdlib/test/ets_SUITE_data/visualize_throughput.html`.
[1]
More Scalable Ordered Set for ETS Using Adaptation.
In Thirteenth ACM SIGPLAN workshop on Erlang (2014).
Kjell Winblad and Konstantinos Sagonas.
https://doi.org/10.1145/2633448.2633455
http://www.it.uu.se/research/group/languages/software/ca_tree/erlang_paper.pdf
[2]
On the Scalability of the Erlang Term Storage
In Twelfth ACM SIGPLAN workshop on Erlang (2013)
Kjell Winblad, David Klaftenegger and Konstantinos Sagonas
https://doi.org/10.1145/2505305.2505308
http://winsh.me/papers/erlang_workshop_2013.pdf
[3]
The stack optimization works by keeping one preallocated stack instance
in every ordered_set table. This stack is updated so that it contains
the search path in some read operations (e.g., ets:next/2). This makes
it possible for a subsequent ets:next/2 to avoid traversing some nodes
in some cases. Unfortunately, the preallocated stack needs to be flagged
so that it is not updated concurrently by several threads which cause
bad scalability.
|
|
Leak introduced in 865ac3b740d9efa1a0583349929591c757300412.
|
|
Also removed unused ERTS_LC_STATIC_ALLOC.
|
|
Run debug VM or config with --enable-lock-checking.
Exercise VM and then run
erts_debug:lc_graph().
to create a file "lc_graph.<pid>" in current working directory.
|
|
with just lots of renaming, nothing else.
erts_lc_locked_locks_t -> lc_thread_t
create_locked_locks -> create_thread_data
l_lcks -> thr
l_lck -> ll
And dropped erts_ prefix on some file scope types.
|
|
|
|
Communication between Erlang processes has conceptually always been
performed through asynchronous signaling. The runtime system
implementation has however previously preformed most operation
synchronously. In a system with only one true thread of execution, this
is not problematic (often the opposite). In a system with multiple threads
of execution (as current runtime system implementation with SMP support)
it becomes problematic. This since it often involves locking of structures
when updating them which in turn cause resource contention. Utilizing
true asynchronous communication often avoids these resource contention
issues.
The case that triggered this change was contention on the link lock due
to frequent updates of the monitor trees during communication with a
frequently used server. The signal order delivery guarantees of the
language makes it hard to change the implementation of only some signals
to use true asynchronous signaling. Therefore the implementations
of (almost) all signals have been changed.
Currently the following signals have been implemented as true
asynchronous signals:
- Message signals
- Exit signals
- Monitor signals
- Demonitor signals
- Monitor triggered signals (DOWN, CHANGE, etc)
- Link signals
- Unlink signals
- Group leader signals
All of the above already defined as asynchronous signals in the
language. The implementation of messages signals was quite
asynchronous to begin with, but had quite strict delivery constraints
due to the ordering guarantees of signals between a pair of processes.
The previously used message queue partitioned into two halves has been
replaced by a more general signal queue partitioned into three parts
that service all kinds of signals. More details regarding the signal
queue can be found in comments in the erl_proc_sig_queue.h file.
The monitor and link implementations have also been completely replaced
in order to fit the new asynchronous signaling implementation as good
as possible. More details regarding the new monitor and link
implementations can be found in the erl_monitor_link.h file.
|
|
|
|
This improves the latency of file operations as dirty schedulers
are a bit more eager to run jobs than async threads, and use a
single global queue rather than per-thread queues, eliminating the
risk of a job stalling behind a long-running job on the same thread
while other async threads sit idle.
There's no such thing as a free lunch though; the lowered latency
comes at the cost of increased busy-waiting which may have an
adverse effect on some applications. This behavior can be tweaked
with the +sbwt flag, but unfortunately it affects all types of
schedulers and not just dirty ones. We plan to add type-specific
flags at a later stage.
sendfile has been moved to inet_drv to lessen the effect of a nasty
race; the cooperation between inet_drv and efile has never been
airtight and the socket dying at the wrong time (Regardless of
reason) could result in fd aliasing. Moving it to the inet driver
makes it impossible to trigger this by closing the socket in the
middle of a sendfile operation, while still allowing it to be
aborted -- something that can't be done if it stays in the file
driver.
The race still occurs if the controlling process dies in the short
window between dispatching the sendfile operation and the dup(2)
call in the driver, but it's much less likely to happen now.
A proper fix is in the works.
--
Notable functional differences:
* The use_threads option for file:sendfile/5 no longer has any
effect.
* The file-specific DTrace probes have been removed. The same
effect can be achieved with normal tracing together with the
nif__entry/nif__return probes to track scheduling.
--
OTP-14256
|
|
If a NIF monitor fired while the resource was present in an ETS
table, the lock checker would erroneously report a lock order
violation.
This has no effect outside of debug builds.
|
|
|
|
|
|
* maint:
Bug fixes of statistics(wall_clock) and statistics(runtime)
Conflicts:
erts/emulator/beam/erl_time_sup.c
|
|
* rickard/statistics-time-fixes/OTP-14597/ERL-465:
Bug fixes of statistics(wall_clock) and statistics(runtime)
Conflicts:
erts/emulator/beam/erl_time_sup.c
|
|
|
|
|
|
This refactor was done using the unifdef tool like this:
for file in $(find erts/ -name *.[ch]); do unifdef -t -f defile -o $file $file; done
where defile contained:
#define ERTS_SMP 1
#define USE_THREADS 1
#define DDLL_SMP 1
#define ERTS_HAVE_SMP_EMU 1
#define SMP 1
#define ERL_BITS_REENTRANT 1
#define ERTS_USE_ASYNC_READY_Q 1
#define FDBLOCK 1
#undef ERTS_POLL_NEED_ASYNC_INTERRUPT_SUPPORT
#define ERTS_POLL_ASYNC_INTERRUPT_SUPPORT 0
#define ERTS_POLL_USE_WAKEUP_PIPE 1
#define ERTS_POLL_USE_UPDATE_REQUESTS_QUEUE 1
#undef ERTS_HAVE_PLAIN_EMU
#undef ERTS_SIGNAL_STATE
|
|
|
|
The implementation is still hidden behind ERTS_ENABLE_LOCK_COUNT, and
all categories are still enabled by default, but the actual counting can be
toggled at will.
OTP-13170
|
|
|
|
\o/
O
/ \
Also removed the body for CHECK_TABLES enabled by HARDDEBUG.
Removed quite useless check for hash,
but kept dead check code for tree.
|
|
from process psd through all owned/fixed tables.
As meta_pid_to{_fixed}_tab maps to slot in meta_main_tab
which is planned for destruction.
In this commit we no longer seize table lock while
freeing the table (free_table_cont) as it's not needed
and makes the code a bit simpler. Any concurrent operation
on the table will only access lock, owner and status
and then bail out.
|
|
Conflicts:
erts/emulator/beam/erl_binary.h
erts/emulator/beam/erl_monitors.c
erts/emulator/beam/erl_nif.c
erts/emulator/beam/global.h
erts/emulator/test/nif_SUITE_data/nif_SUITE.c
|
|
|
|
Magic references are *intentionally* indistinguishable from ordinary
references for the Erlang software. Magic references do not change
the language, and are intended as a pure runtime internal optimization.
An ordinary reference is typically used as a key in some table. A
magic reference has a direct pointer to a reference counted magic
binary. This makes it possible to implement various things without
having to do lookups in a table, but instead access the data directly.
Besides very fast lookups this can also improve scalability by
removing a potentially contended table. A couple of examples of
planned future usage of magic references are ETS table identifiers,
and BIF timer identifiers.
Besides future optimizations using magic references it should also
be possible to replace the exposed magic binary cludge with magic
references. That is, magic binaries that are exposed as empty
binaries to the Erlang software.
|
|
* maint:
Fix call time tracing with dirty schedulers
|
|
|
|
* maint:
Multi scheduling block bug-fixes
Fix VM global GC info for dirty schedulers
Leave dirty work in dirty run-queues on multi scheduling block
Fix premature removal of process struct
Fix crash due to GC of node entry on dirty scheduler
|
|
|
|
|
|
Ensure that we cannot get any dangling pointers into code that
has been purged. This is done by a two phase purge. At first
phase all fun entries pointing into the code to purge are marked
for purge. All processes trying to call these funs will be suspended
and by this we avoid getting new direct references into the code.
When all processes has been checked, these processes are resumed.
The new purge strategy now also completely ignore the existence of
indirect references to the code (funs). If such exist, they will
cause bad fun exceptions to the caller, but will not prevent a
soft purge or cause a kill of a process having such live references
during a hard purge. This since it is impossible to give any
guarantees that no processes in the system have such indirect
references. Even when the system is completely clean from such
references, new ones can appear via distribution and/or disk.
|
|
|
|
Add the possibility to use modules as trace data receivers. The functions
in the module have to be nifs as otherwise complex trace probes will be
very hard to handle (complex means trace probes for ports for example).
This commit changes the way that the ptab->tracer field works from always
being an immediate, to now be NIL if no tracer is present or else be
the tuple {TracerModule, TracerState} where TracerModule is an atom that
is later used to lookup the appropriate tracer callbacks to call and
TracerState is just passed to the tracer callback. The default process and
port tracers have been rewritten to use the new API.
This commit also changes the order which trace messages are delivered to the
potential tracer process. Any enif_send done in a tracer module may be delayed
indefinitely because of lock order issues. If a message is delayed any other
trace message send from that process is also delayed so that order is preserved
for each traced entity. This means that for some trace events (i.e. send/receive)
the events may come in an unintuitive order (receive before send) to the
trace receiver. Timestamps are taken when the trace message is generated so
trace messages from differented processes may arrive with the timestamp
out of order.
Both the erlang:trace and seq_trace:set_system_tracer accept the new tracer
module tracers and also the backwards compatible arguments.
OTP-10267
|
|
|
|
Microstate accounting is a way to track which state the
different threads within ERTS are in. The main usage area
is to pin point performance bottlenecks by checking which
states the threads are in and then from there figuring out
why and where to optimize.
Since checking whether microstate accounting is on or off is
relatively expensive if done in a short loop only a few of the
states are enabled by default and more states can be enabled
through configure.
I've done some benchmarking and the overhead with it turned off
is not noticible and with it on it is a fraction of a percent.
If you enable the extra states, depending on the benchmark,
the ovehead when turned off is about 1% and when turned on
somewhere inbetween 5-15%.
OTP-12345
|
|
Had to move the hashing because of a race that can otherwise happen
where a new os_pid value was inserted into the hash before the
previous value had been removed.
Also replaced the protocol inbetween erts and child setup to be
a binary protocol. This was done in order to deal with the varying
size of Eterm.
|
|
Instead of forking from the beam process, we create a separate
process in which all forks are done. This has several advantages:
1) performance:
* don't have to close all fd's in the world
* fork only has to copy stuff from a small process
* work is done in a completely seperate process
* a 3x performance increase has been measured,
can be made even greater (10x) if we cache the
environment in child setup
2) stability
* the exec is done in another process than beam, which means that
if the file that we exec to is on an nfs that is not available
right now we will not block a scheduler until the nfs returns.
3) simplicity
* don't have to deal with SIGCHLD in the erts
Unfortunately, this solution also implies some badness.
1) There will always be a seperate process running together with
beam on unix. This could be confusing and undesirable.
2) We have to transfer the entire environment to child_setup
for each command.
OTP-13088
|
|
|