Age | Commit message (Collapse) | Author |
|
Queues used for communication between async threads and scheduler threads
have been replaced with lock-free queues.
Drivers using the driver_async functionality are not automatically locked
to the system anymore, and can be unloaded as any dynamically linked in
driver.
Scheduling of ready async jobs is now also interleaved in between other
jobs. Previously all ready async jobs was performed at once.
|
|
|
|
The implementation of an ERTS internal, generic, many to one, lock-free
queue for communication between threads. The many to one scenario is
very common in ERTS, so it can be used in a lot of places in the future.
Changing to this queue from a lock based queue, however, often requires
some redesigning. This since we have often used the lock of the queue
to protect other information too.
|
|
The ERTS internal system block functionality has been replaced by
new functionality for blocking the system. The old system block
functionality had contention issues and complexity issues. The
new functionality piggy-backs on thread progress tracking functionality
needed by newly introduced lock-free synchronization in the runtime
system. When the functionality for blocking the system isn't used
there is more or less no overhead at all. This since the functionality
for tracking thread progress is there and needed anyway.
|
|
A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
instances instead of one instance. Apart from reducing contention
this also ensures that memory allocators always create memory
segments on the local NUMA node on a NUMA system.
|
|
Conflicts:
erts/aclocal.m4
erts/emulator/beam/erl_db.c
erts/emulator/sys/win32/sys.c
erts/include/internal/ethread_header_config.h.in
|
|
|
|
|
|
Code area allocation was done twice; first in read_code_header()
and then in erts_make_stub_module() itself.
|
|
* rickard/aux-work-bug/OTP-9567:
Fix lost wakeup of scheduler when enqueuing auxiliary work
|
|
When auxiliary work was enqueued on a scheduler, the wakeup of
the scheduler in order to handle this work could be lost. Wakeups
in order to handle ordinary work were not effected by this bug.
The bug only effected runtime systems with SMP support as follows:
* Deallocation of some ETS data structures could be delayed.
* On Linux systems not using the NPTL thread library (typically
ancient systems with kernel versions prior to 2.6) and Windows
systems, the {Port, {exit_status, Status}} message from a
terminating port program could be delayed. That is, it only
effected port programs which had been started by passing
exit_status as an option to open_port/2.
|
|
* ta/docs-fixes:
Fix misspelling of intermediate
Fix typos in erts/preloaded/src
Fix more misspellings of compatibility
Fix misspelling of kept
Fix misspelling of compatibility in ssl_basic_SUITE
Fix misspelling of compatibility
Fix misspelling of accommodate
Fix misspelling of exceed
Fix misspelling of accidentally
Fix misspelling of erroneous in xmerl_xsd
Fix misspelling of erroneous
Fix misspelling of successful
Fix typos in instrument(3)
Fix typos in dbg(3)
dialyzer: fix a small typo in list_to_bitstring test
Fix typos in cover.erl
Fix typos (variable name) in erl_nif(3)
Fix typos in mod_esi(3)
Fix trivial typos in erlang(3)
OTP-9555
|
|
* rickard/glibc-mutex-destroy-bug/OTP-9373:
Do not abort emulator when buggy pthread impl return EBUSY
|
|
* pan/erl-bif-types/OTP-9496:
Cleanup ETS bif's in hipe:erl_bif_types.erl (for dialyzer)
|
|
* pan/win_erlsrv_stop/OTP-9344:
Move init of smp rw mutex from init to sys_args to make sure that it is initialized before the first erts_sys_getenv call
Move erts_sys_env_init() to erts_sys_pre_init()
Remove _DEBUG from start_erl.c
Spelling correction in erlsrv doc
Convert windows start_erl to take rootdir on command line
Add command start_disabled to erlsrv
Add global lock for erlsrv to avoid races
Change start order so that service_event gets initialized before it's used
|
|
|
|
* In rare cases we could have no run_queue in schedule misc ops
|
|
|
|
* fm/enif_compare-64-to-32bits-cast:
Fix enif_compare on 64bits machines
OTP-9533
|
|
In 64bits machines the Sint type has a size of 8 bytes,
while on 32bits machines it has a 4 bytes size.
enif_compare was ignoring this and therefore returning
incorrect values when the result of the CMP function
(which returns a Sint value) doesn't fit in 4 bytes.
For example, passing the operands -1294536544000 and
-1178704800000 to enif_compare would trigger the bug.
|
|
This BIF's second parameter is a list of options.
Currently the only allowed option is {minor_version, Version}
where version is either 0 (default) or 1.
|
|
Add erlang:check_old_code/1 to quickly check whether a module
has old code. If there is no old code, there is no need to call
erlang:check_process_code/2 for all processes, which will save
some time if there are many processes.
|
|
There is no need to suspend the process if the module has no old
code. Measurements show that this change will make
erlang:check_process_code/2 in an SMP emulator about four times
faster if the module has no old code.
|
|
* rc/r14-gc-fix:
fix 64-bit issues in the garbage collection
OTP-9488
|
|
We discovered that if a single Erlang process tried to grow above 32
GB (i.e., more 64-bit words than can be counted by a 32-bit number),
the VM failed to find the next larger heap size, even though there
were plenty more heap sizes left to pick from and even though we had a
lot more memory available on the machine. (Obviously, this is only
applicable on 64-bit Erlang.)
It turned out to be due to some 'int' variables in the heap resizing
parts of erl_gc.c not being properly updated to 'Uint' or 'Sint'. Once
that was fixed, I got segfaults instead as soon as the heap got larger
than 2^32 words, due to even more 'int' declarations in the same file,
but now in the GC code.
After fixing this as well, I successfully ran an Erlang node in which
a single Erlang process had a heap so large that I'm not at liberty to
divulge the exact size, but I think the scientific term is
"humongous", and I'm confident that there are no further immediate
problems with very very large individual process heaps.
|
|
Constructing binaries using the bit syntax with literals sizes
that would not fit in an Uint will either cause an emulator crash
or the loading to be aborted.
Use the new TAG_o tag introduced in the previous commit to make sure
that the attempt to create huge binary literals will generate a
system_limit exception at run-time.
|
|
The handling of large values for other tags than TAG_i (integer) is
buggy. Any tag value equal to or greater than 2^40 (5 bytes) will
abort loading. Tag values fitting in 5 bytes will be truncated to 4
bytes values.
Those bugs cause real problems because the bs_init2/6 and
bs_init_bits/6 instructions unfortunately use TAG_u to encode literal
sizes (using TAG_i would have been a better choice, but it is too late
to change that now). Any binary size that cannot fit in an Uint
should cause a system_limit exception at run-time, but instead the
buggy handling will either cause an emulator crash (for values in the
range 2^32 to 2^40-1) or abort loading.
In this commit, implement overflow checking of tag values as a
preparation for fixing the binary construction instructions. If any
tag value cannot fit in an Uint (except for TAG_i), change the
tag to the special TAG_o overflow tag.
|
|
Attempting to construct <<0:((1 bsl 32)-1)>>, the largest bitstring
allowed in a 32 bit emulator, would cause an emulator crash because
of integer overflow.
Fix the problem by using an Uint64 to avoid integer overflow.
Do not attempt to handle construction of <<0:((1 bsl 64)-1>> in
a 64-bit emulator, because that will certainly cause the emulator
to terminate anyway because of insufficient memory.
|
|
* sverker/allocator-aoff/OTP-9424:
New allocator: Address order first fit (aoff)
|
|
alloc_no of sbmbc_low_alloc was set to ERTS_ALC_A_STANDARD_LOW
|
|
|
|
* sverker/enif_make_int64-halfword/OTP-9394:
Fix halfword bug in enif_make_int64
|
|
* rickard/sbmbc/OTP-9339:
Use separate memory carriers for small blocks
|
|
* sverker/ets_delete-deadlock-race/OTP-9423:
Fix bug in ets:delete for write_concurrency that could lead to deadlock
|
|
|
|
A trace matchspec with 'enable_trace' or 'disable_trace' in body could
cause an emulator crash if a concurrent process altered the trace
setting of the traced function by calling erlang:trace_pattern.
The effect was a deallocation of the binary holding the matchspec
program while it was running. Fixed by increasing reference count of
ms-binary in the cases when 'enable_trace' or 'disable_trace' may
cause a system block that may alter the ongoing trace.
The paradox here is that db_prog_match() is using erts_smp_block_system()
to do 'enable_trace' and 'disable_trace' in a safe (atomic) way. But that
also have the (non-atomic) effect that racing thread might block the
system and change the trace settings with erlang:trace_pattern.
|
|
Relocking in ets_delete_1() and remove_named_tab() was done by
unlocking the table without clearing the is_thread_safe flag. A racing
thread could then read-lock the table and then incorrectly
write-unlock the table as db_unlock() looked at is_thread_safe to
determine which kind of lock to unlock.
Several fixes:
1. Make db_unlock() use argument 'kind' instead of is_thread_safe to
determine lock type.
2. Make relock logic use db_lock() and db_unlock() instead of directly
accessing lock primitives.
3. Do ownership transfer earlier in ets_delete_1 to avoid racing owner
process to also start deleting the same table.
|
|
The bug was creating an invalid bignum instead of a small integer,
causing strange comparing behavior (=:= failed but == succeeded).
|
|
All uses of the old deprecated atomic API in the runtime system
have been replaced with the use of the new atomic API. In a lot of
places this change imply a relaxation of memory barriers used.
|
|
The ethread atomics API now also provide double word size atomics.
Double word size atomics are implemented using native atomic
instructions on x86 (when the cmpxchg8b instruction is available)
and on x86_64 (when the cmpxchg16b instruction is available). On
other hardware where 32-bit atomics or word size atomics are
available, an optimized fallback is used; otherwise, a spinlock,
or a mutex based fallback is used.
The ethread library now performs runtime tests for presence of
hardware features, such as for example SSE2 instructions, instead
of requiring this to be determined at compile time.
There are now functions implementing each atomic operation with the
following implied memory barrier semantics: none, read, write,
acquire, release, and full. Some of the operation-barrier
combinations aren't especially useful. But instead of filtering
useful ones out, and potentially miss a useful one, we implement
them all.
A much smaller set of functionality for native atomics are required
to be implemented than before. More or less only cmpxchg and a
membar macro are required to be implemented for each atomic size.
Other functions will automatically be constructed from these. It is,
of course, often wise to implement more that this if possible from a
performance perspective.
|
|
|
|
|
|
|
|
* rickard/driver_async_cancel/OTP-9302:
Fix driver_async_cancel()
|
|
|
|
* sverker/hipe-misc-fixing/OTP-9298:
hipe_mkliterals print argv[0] in generated files
Fix code:is_module_native segv on deleted module
lock checking fix in hipe_bif2.c
|
|
* rickard/barriers/OTP-9281:
Silence warnings
Fix build with hipe on amd64
Reduce number of atomic ops
Use 32-bit atomic for port snapshot
Remove pointless erts_ports_alive variable
Ensure quick break
Ensure that all rehashing information are seen when done
Ensure that stack updates are seen when stack is released
Add needed barriers for write_concurrency tables
Homogenize memory barriers on atomics
|
|
|
|
|
|
Counters for active, and used schedulers have been coalesced in
order to reduce the amount of atomic operations needed.
Some currently not strictly necessary barriers have also been added
in order to be future proof.
|