Age | Commit message (Collapse) | Author |
|
* rickard/rm-common-runq/OTP-9727:
Remove common run-queue in SMP case
Fix scheduler suspend bug
Conflicts:
erts/emulator/beam/erl_init.c
|
|
The common run-queue implementation is removed since it is unused,
untested, undocumented, unsupported, and only complicates the code.
A spinlock used by the run-queue management sometimes got heavily
contended. This code has now been rewritten, and the spinlock
has been removed.
|
|
* sverk/hipe-without-fpe/OTP-9724:
otp_build: Disable FPE by default on Linux
stdlib: Make sure qlc_SUITE:otp_6964 restores backtrace_depth
erts: Add test for inf/NaN intermediate float results
hipe,erts: Allow hipe without floating point exceptions
hipe: Fix bug in hipe_rtl_lcm:calc_killed_expr_bb
erts: Rename macros used by float instructions without FPE
|
|
* rickard/sched-compact-load/OTP-9695:
Add switch that can disable scheduler compaction of load
|
|
|
|
|
|
* rickard/generic-thr-queue/OTP-9632:
Use generic lock-free queue for async threads
Use generic lock-free queue for misc aux work
Implement generic lock-free queue
|
|
* rickard/thr-progress-block/OTP-9631:
Replace system block with thread progress block
|
|
* rickard/alloc-opt/OTP-7775:
Optimize memory allocation
Conflicts:
erts/aclocal.m4
erts/emulator/hipe/hipe_bif_list.m4
erts/preloaded/ebin/erl_prim_loader.beam
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/init.beam
erts/preloaded/ebin/otp_ring0.beam
erts/preloaded/ebin/prim_file.beam
erts/preloaded/ebin/prim_inet.beam
erts/preloaded/ebin/prim_zip.beam
erts/preloaded/ebin/zlib.beam
|
|
Queues used for communication between async threads and scheduler threads
have been replaced with lock-free queues.
Drivers using the driver_async functionality are not automatically locked
to the system anymore, and can be unloaded as any dynamically linked in
driver.
Scheduling of ready async jobs is now also interleaved in between other
jobs. Previously all ready async jobs was performed at once.
|
|
|
|
The ERTS internal system block functionality has been replaced by
new functionality for blocking the system. The old system block
functionality had contention issues and complexity issues. The
new functionality piggy-backs on thread progress tracking functionality
needed by newly introduced lock-free synchronization in the runtime
system. When the functionality for blocking the system isn't used
there is more or less no overhead at all. This since the functionality
for tracking thread progress is there and needed anyway.
|
|
A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
instances instead of one instance. Apart from reducing contention
this also ensures that memory allocators always create memory
segments on the local NUMA node on a NUMA system.
|
|
In the half-word emulator, smp emulator, and non-smp emulator
the X register and float register arrays were allocated in
different ways.
Always allocate the registers and store the pointers to the
allocated register arrays in the scheduler data.
|
|
All uses of the old deprecated atomic API in the runtime system
have been replaced with the use of the new atomic API. In a lot of
places this change imply a relaxation of memory barriers used.
|
|
|
|
Fix thread unsafe access to process status field introduced in OTP-9125.
|
|
|
|
* rickard/temp_alloc_check/OTP-9028:
Verify that temp allocated memory is released
|
|
|
|
Introduce HAllocX to allocate heap fragments with a larger capacity
than requested and by that reduce the number of fragments allocated.
|
|
* rickard/ets-tab-delete/OTP-8999:
Safe deallocation of ETS-table structures
Fix rwlock resource leak when hitting system limit
Conflicts:
erts/emulator/beam/erl_process.h
erts/emulator/beam/erl_process.c
|
|
|
|
Ensure that all threads potentially accessing an ETS-table have dropped
all references to the table before deallocating it.
|
|
|
|
|
|
The scheduler wakeup threshold is now possible to adjust at system boot.
For more information see the `+swt' command line argument of `erl'.
|
|
* egil/R14A/binary-gc-wrap/OTP-8730:
Increase vheap counter to Uint64
Fix wrapping in next vheap calculation
|
|
Calling erlang:system_info/1 with the new argument 'update_cpu_info'
will make the runtime system reread and update the internally stored
CPU information. For more information see the documentation of
erlang:system_info(update_cpu_info).
|
|
This will reduce the risk of integer wrapping in bin vheap counting.
The vheap size series will now use the golden ratio instead of doubling
and fibonacci sequences.
OTP #8730
|
|
* rickard/ethread-rewrite/OTP-8544:
Rewrite ethread library
|
|
Large parts of the ethread library have been rewritten. The
ethread library is an Erlang runtime system internal, portable
thread library used by the runtime system itself.
Most notable improvement is a reader optimized rwlock
implementation which dramatically improve the performance of
read-lock/read-unlock operations on multi processor systems by
avoiding ping-ponging of the rwlock cache lines. The reader
optimized rwlock implementation is used by miscellaneous
rwlocks in the runtime system that are known to be read-locked
frequently, and can be enabled on ETS tables by passing the
`{read_concurrency, true}' option upon table creation. See the
documentation of `ets:new/2' for more information.
The ethread library can now also use the libatomic_ops library
for atomic memory accesses. This makes it possible for the
Erlang runtime system to utilize optimized atomic operations
on more platforms than before. Use the
`--with-libatomic_ops=PATH' configure command line argument
when specifying where the libatomic_ops installation is
located. The libatomic_ops library can be downloaded from:
http://www.hpl.hp.com/research/linux/atomic_ops/
The changed API of the ethread library has also caused
modifications in the Erlang runtime system. Preparations for
the to come "delayed deallocation" feature has also been done
since it depends on the ethread library.
Note: When building for x86, the ethread library will now use
instructions that first appeared on the pentium 4 processor. If
you want the runtime system to be compatible with older
processors (back to 486) you need to pass the
`--enable-ethread-pre-pentium4-compatibility' configure command
line argument when configuring the system.
|
|
Merging the three off-heap lists (binaries, funs and externals) into
one list. This reduces memory consumption by two words (pointers) per
ETS object.
|
|
|
|
Initial commit with a new breakpoint instruction and PSD areas
for temporary time storage during tracing.
|
|
New NIF features:
Send messages from a NIF, or from thread created by NIF, to any local
process (enif_send)
Store terms between NIF calls (enif_alloc_env, enif_make_copy)
Create binary terms with user defined memory management
(enif_make_resource_binary)
|
|
Some test suites need to differentiate between 32-bit terms
and 32-bit pointers.
While at it, remove some more warnings in process.c for SMP and debug.
|
|
For cleanliness, use BeamInstr instead of the UWord
data type to any machine-sized words that are used
for BEAM instructions. Only use UWord for untyped
words in general.
|
|
Store Erlang terms in 32-bit entities on the heap, expanding the
pointers to 64-bit when needed. This works because all terms are stored
on addresses in the 32-bit address range (the 32 most significant bits
of pointers to term data are always 0).
Introduce a new datatype called UWord (along with its companion SWord),
which is an integer having the exact same size as the machine word
(a void *), but might be larger than Eterm/Uint.
Store code as machine words, as the instructions are pointers to
executable code which might reside outside the 32-bit address range.
Continuation pointers are stored on the 32-bit stack and hence must
point to addresses in the low range, which means that loaded beam code
much be placed in the low 32-bit address range (but, as said earlier,
the instructions themselves are full words).
No Erlang term data can be stored on C stacks (enforced by an
earlier commit).
This version gives a prompt, but test cases still fail (and dump core).
The loader (and emulator loop) has instruction packing disabled.
The main issues has been in rewriting loader and actual virtual
machine. Subsystems (like distribution) does not work yet.
|
|
This is the first step in the implementation of the half-word emulator,
a 64-bit emulator where all pointers to heap data will be stored
in 32-bit words. Code specific for this emulator variant is
conditionally compiled when the HALFWORD_HEAP define has
a non-zero value.
First force all pointers to heap data to fall into a single 32-bit range,
but still store them in 64-bit words.
Temporary term data stored on C stack is moved into scheduler specific
storage (allocated as heaps) and macros are added to make this
happen only in emulators where this is needed. For a vanilla VM the
temporary terms are still stored on the C stack.
|
|
The garbage collector in r13b03 is too aggressive in some cases. This
commit raises the level of default initial allowed binary garbage
(virtual heap for binaries) before collecting from 233 words to
46368 words (181 kB on 32-bit).
A new option, min_bin_vheap_size, has been added to spawn_opt,
system_flag and process_flag can be used to change the default values.
The option can also be used with system_info and process_info to
inspect the values.
For symmetry the option min_heap_size has been added to the above
functions where it was previously missing.
Add testcases for min_bin_vheap_size and min_heap_size for
functions process_flag/2, process_info/2, system_info/2 and
spawn_opt/2.
|
|
fragments was created. This will mainly benefit NIFs that return
large compound terms.
|
|
|
|
|