Age | Commit message (Collapse) | Author |
|
|
|
* rickard/dealloc/OTP-10162:
Improve the enqueue operation of delayed dealloc
Implement delayed aux work wake up
|
|
By using a delayed aux work wake up approach, a memory barrier
can be omitted in the delayed dealloc enqueue operation. The
amount of operations, on the potentially contended, wake up
structure is also reduced.
|
|
The hybrid heap emulator was last working in the non-SMP R11B
run-time system. When the constant pools were introduced in R12B,
the hybrid heap emulator was not updated to handle them.
At this point, the harm from reduced readability of the code is
greater than any potential usefulness of keeping the code.
|
|
Conflicts:
erts/vsn.mk
|
|
* rickard/sched-busy-wait/OTP-10044:
Add switch controlling scheduler busy wait
Conflicts:
erts/emulator/beam/erl_process.c
erts/emulator/beam/erl_process.h
|
|
rickard/sched-wakeup-other-r15b01/OTP-10033
Conflicts:
erts/emulator/beam/erl_process.c
erts/vsn.mk
|
|
|
|
|
|
|
|
|
|
User tags in a dynamic trace enabled VM are spread throughout the system
in the same way as seq_trace tokens. This is used by the file module
and various other modules to get hold of the tag from the user process
without changing the protocol.
|
|
|
|
* rickard/barriers/OTP-9922:
Reduce thread progress read operations in handle_aux_work()
Misc memory barrier fixes
|
|
|
|
- Document barrier semantics
- Introduce ddrb suffix on atomic ops
- Barrier macros for both non-SMP and SMP case
- Make the thread progress API a bit more intuitive
|
|
|
|
* rickard/rm-common-runq/OTP-9727:
Remove common run-queue in SMP case
Fix scheduler suspend bug
Conflicts:
erts/emulator/beam/erl_init.c
|
|
The common run-queue implementation is removed since it is unused,
untested, undocumented, unsupported, and only complicates the code.
A spinlock used by the run-queue management sometimes got heavily
contended. This code has now been rewritten, and the spinlock
has been removed.
|
|
* sverk/hipe-without-fpe/OTP-9724:
otp_build: Disable FPE by default on Linux
stdlib: Make sure qlc_SUITE:otp_6964 restores backtrace_depth
erts: Add test for inf/NaN intermediate float results
hipe,erts: Allow hipe without floating point exceptions
hipe: Fix bug in hipe_rtl_lcm:calc_killed_expr_bb
erts: Rename macros used by float instructions without FPE
|
|
* rickard/sched-compact-load/OTP-9695:
Add switch that can disable scheduler compaction of load
|
|
|
|
|
|
* rickard/generic-thr-queue/OTP-9632:
Use generic lock-free queue for async threads
Use generic lock-free queue for misc aux work
Implement generic lock-free queue
|
|
* rickard/thr-progress-block/OTP-9631:
Replace system block with thread progress block
|
|
* rickard/alloc-opt/OTP-7775:
Optimize memory allocation
Conflicts:
erts/aclocal.m4
erts/emulator/hipe/hipe_bif_list.m4
erts/preloaded/ebin/erl_prim_loader.beam
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/init.beam
erts/preloaded/ebin/otp_ring0.beam
erts/preloaded/ebin/prim_file.beam
erts/preloaded/ebin/prim_inet.beam
erts/preloaded/ebin/prim_zip.beam
erts/preloaded/ebin/zlib.beam
|
|
Queues used for communication between async threads and scheduler threads
have been replaced with lock-free queues.
Drivers using the driver_async functionality are not automatically locked
to the system anymore, and can be unloaded as any dynamically linked in
driver.
Scheduling of ready async jobs is now also interleaved in between other
jobs. Previously all ready async jobs was performed at once.
|
|
|
|
The ERTS internal system block functionality has been replaced by
new functionality for blocking the system. The old system block
functionality had contention issues and complexity issues. The
new functionality piggy-backs on thread progress tracking functionality
needed by newly introduced lock-free synchronization in the runtime
system. When the functionality for blocking the system isn't used
there is more or less no overhead at all. This since the functionality
for tracking thread progress is there and needed anyway.
|
|
A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
instances instead of one instance. Apart from reducing contention
this also ensures that memory allocators always create memory
segments on the local NUMA node on a NUMA system.
|
|
In the half-word emulator, smp emulator, and non-smp emulator
the X register and float register arrays were allocated in
different ways.
Always allocate the registers and store the pointers to the
allocated register arrays in the scheduler data.
|
|
All uses of the old deprecated atomic API in the runtime system
have been replaced with the use of the new atomic API. In a lot of
places this change imply a relaxation of memory barriers used.
|
|
|
|
Fix thread unsafe access to process status field introduced in OTP-9125.
|
|
|
|
* rickard/temp_alloc_check/OTP-9028:
Verify that temp allocated memory is released
|
|
|
|
Introduce HAllocX to allocate heap fragments with a larger capacity
than requested and by that reduce the number of fragments allocated.
|
|
* rickard/ets-tab-delete/OTP-8999:
Safe deallocation of ETS-table structures
Fix rwlock resource leak when hitting system limit
Conflicts:
erts/emulator/beam/erl_process.h
erts/emulator/beam/erl_process.c
|
|
|
|
Ensure that all threads potentially accessing an ETS-table have dropped
all references to the table before deallocating it.
|
|
|
|
|
|
The scheduler wakeup threshold is now possible to adjust at system boot.
For more information see the `+swt' command line argument of `erl'.
|
|
* egil/R14A/binary-gc-wrap/OTP-8730:
Increase vheap counter to Uint64
Fix wrapping in next vheap calculation
|
|
Calling erlang:system_info/1 with the new argument 'update_cpu_info'
will make the runtime system reread and update the internally stored
CPU information. For more information see the documentation of
erlang:system_info(update_cpu_info).
|
|
This will reduce the risk of integer wrapping in bin vheap counting.
The vheap size series will now use the golden ratio instead of doubling
and fibonacci sequences.
OTP #8730
|
|
* rickard/ethread-rewrite/OTP-8544:
Rewrite ethread library
|
|
Large parts of the ethread library have been rewritten. The
ethread library is an Erlang runtime system internal, portable
thread library used by the runtime system itself.
Most notable improvement is a reader optimized rwlock
implementation which dramatically improve the performance of
read-lock/read-unlock operations on multi processor systems by
avoiding ping-ponging of the rwlock cache lines. The reader
optimized rwlock implementation is used by miscellaneous
rwlocks in the runtime system that are known to be read-locked
frequently, and can be enabled on ETS tables by passing the
`{read_concurrency, true}' option upon table creation. See the
documentation of `ets:new/2' for more information.
The ethread library can now also use the libatomic_ops library
for atomic memory accesses. This makes it possible for the
Erlang runtime system to utilize optimized atomic operations
on more platforms than before. Use the
`--with-libatomic_ops=PATH' configure command line argument
when specifying where the libatomic_ops installation is
located. The libatomic_ops library can be downloaded from:
http://www.hpl.hp.com/research/linux/atomic_ops/
The changed API of the ethread library has also caused
modifications in the Erlang runtime system. Preparations for
the to come "delayed deallocation" feature has also been done
since it depends on the ethread library.
Note: When building for x86, the ethread library will now use
instructions that first appeared on the pentium 4 processor. If
you want the runtime system to be compatible with older
processors (back to 486) you need to pass the
`--enable-ethread-pre-pentium4-compatibility' configure command
line argument when configuring the system.
|
|
Merging the three off-heap lists (binaries, funs and externals) into
one list. This reduces memory consumption by two words (pointers) per
ETS object.
|