Age | Commit message (Collapse) | Author |
|
Add initial support for dirty schedulers.
There are two types of dirty schedulers: CPU schedulers and I/O
schedulers. By default, there are as many dirty CPU schedulers as there are
normal schedulers and as many dirty CPU schedulers online as normal
schedulers online. There are 10 dirty I/O schedulers (similar to the choice
of 10 as the default for async threads).
By default, dirty schedulers are disabled and conditionally compiled
out. To enable them, you must pass --enable-dirty-schedulers to the
top-level configure script when building Erlang/OTP.
Current dirty scheduler support requires the emulator to be built with SMP
support. This restriction will be lifted in the future.
You can specify the number of dirty schedulers with the command-line
options +SDcpu (for dirty CPU schedulers) and +SDio (for dirty I/O
schedulers). The +SDcpu option is similar to the +S option in that it takes
two numbers separated by a colon: C1:C2, where C1 specifies the number of
dirty schedulers available and C2 specifies the number of dirty schedulers
online. The +SDPcpu option allows numbers of dirty CPU schedulers available
and dirty CPU schedulers online to be specified as percentages, similar to
the existing +SP option for normal schedulers. The number of dirty CPU
schedulers created and dirty CPU schedulers online may not exceed the
number of normal schedulers created and normal schedulers online,
respectively. The +SDio option takes only a single number specifying the
number of dirty I/O schedulers available and online. There is no support
yet for programmatically changing at run time the number of dirty CPU
schedulers online via erlang:system_flag/2. Also, changing the number of
normal schedulers online via erlang:system_flag(schedulers_online,
NewSchedulersOnline) should ensure that there are no more dirty CPU
schedulers than normal schedulers, but this is not yet implemented. You can
retrieve the number of dirty schedulers by passing dirty_cpu_schedulers,
dirty_cpu_schedulers_online, or dirty_io_schedulers to
erlang:system_info/1.
Currently only NIFs are able to access dirty scheduler
functionality. Neither drivers nor BIFs currently support dirty
schedulers. This restriction will be addressed in the future.
If dirty scheduler support is present in the runtime, the initial status
line Erlang prints before presenting its interactive prompt will include
the indicator "[ds:C1:C2:I]" where "ds" indicates "dirty schedulers", "C1"
indicates the number of dirty CPU schedulers available, "C2" indicates the
number of dirty CPU schedulers online, and "I" indicates the number of
dirty I/O schedulers.
Document The dirty NIF API in the erl_nif man page. The API closely follows
Rickard Green's presentation slides from his talk "Future Extensions to the
Native Interface", presented at the 2011 Erlang Factory held in the San
Francisco Bay Area. Rickard's slides are available online at
http://bit.ly/1m34UHB .
Document the new erl command-line options, the additions to
erlang:system_info/1, and also add the erlang:system_flag/2 dirty scheduler
documentation even though it's not yet implemented.
To determine whether the dirty NIF API is available, native code can check
to see whether the C preprocessor macro ERL_NIF_DIRTY_SCHEDULER_SUPPORT is
defined. To check if dirty schedulers are available at run time, native
code can call the boolean enif_have_dirty_schedulers() function, and Erlang
code can call erlang:system_info(dirty_cpu_schedulers), which raises
badarg if no dirty scheduler support is available.
Add a simple dirty NIF test to the emulator NIF suite.
|
|
* lukas/erts/ethr_smp_req_native_compiletime/OTP-11196:
Bailout if no native implementations are found
|
|
Some basic tests are already done in configure. This makes sure we
cover all cases by bailing out when compiling as well.
|
|
Since b29ecbd (OTP-10418, R15B03) Erlang does not compile anymore with
old versions of GCC that do not have atomic ops builtins on platforms
where there is no native ethread implementation (e.g. ARM):
In file included from ../include/internal/gcc/ethread.h:29,
from ../include/internal/ethread.h:354,
from beam/erl_threads.h:264,
from beam/erl_smp.h:27,
from beam/sys.h:413,
from hipe/hipe_mkliterals.c:29:
../include/internal/gcc/ethr_membar.h:49:4: error: #error "No __sync_val_compare_and_swap"
This patch adds a header guard in "gcc/ethread.h", as is present in
"libatomic_ops/ethread.h".
|
|
|
|
|
|
* sverk/win-64-pointer-fix:
erts: Correct term type for printf %T
erts: Correct internal printf integer type for win64
erts: Correct some printf type formatting
erts: Fix type bug in get_proc_affinity for windows
OTP-10887
Forgot this ticket for sverk/erlang_pid-revert:
OTP-10885
|
|
|
|
|
|
An attempt to speedup valgrind
|
|
|
|
A faulty #if 0 caused healthy gcc builtin atomic to be ignored.
|
|
- Document barrier semantics
- Introduce ddrb suffix on atomic ops
- Barrier macros for both non-SMP and SMP case
- Make the thread progress API a bit more intuitive
|
|
|
|
Removed symbolic links from repository.
|
|
Windows native critical sections are now used internally in the
runtime system as mutex implementation. This since they perform
better under extreme contention than our own implementation.
|
|
The ethread atomics API now also provide double word size atomics.
Double word size atomics are implemented using native atomic
instructions on x86 (when the cmpxchg8b instruction is available)
and on x86_64 (when the cmpxchg16b instruction is available). On
other hardware where 32-bit atomics or word size atomics are
available, an optimized fallback is used; otherwise, a spinlock,
or a mutex based fallback is used.
The ethread library now performs runtime tests for presence of
hardware features, such as for example SSE2 instructions, instead
of requiring this to be determined at compile time.
There are now functions implementing each atomic operation with the
following implied memory barrier semantics: none, read, write,
acquire, release, and full. Some of the operation-barrier
combinations aren't especially useful. But instead of filtering
useful ones out, and potentially miss a useful one, we implement
them all.
A much smaller set of functionality for native atomics are required
to be implemented than before. More or less only cmpxchg and a
membar macro are required to be implemented for each atomic size.
Other functions will automatically be constructed from these. It is,
of course, often wise to implement more that this if possible from a
performance perspective.
|
|
|
|
* rickard/barriers/OTP-9281:
Silence warnings
Fix build with hipe on amd64
Reduce number of atomic ops
Use 32-bit atomic for port snapshot
Remove pointless erts_ports_alive variable
Ensure quick break
Ensure that all rehashing information are seen when done
Ensure that stack updates are seen when stack is released
Add needed barriers for write_concurrency tables
Homogenize memory barriers on atomics
|
|
Atomic operations with specified barriers have specified barrier semantics.
Set and read operations have undefined barrier semantics. All other atomic
operations implied full memory barriers, except when using the libatomic_ops
library and the tilera atomics api.
Some code in the runtime system assumed that all operations used (except for
set, read and specified) implied full memory barriers. The use of the
libatomic_ops library and the tilera atomics api have therefore been modified
to behave as the other implementations.
Some atomic operations with specified barrier semantics on sparc32 have also
been been relaxed in this commit.
|
|
Conflicts:
erts/emulator/beam/erl_printf_term.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The atomic memory operations interface used the 'long' type and assumed that
it was of the same size as 'void *'. This is true on most platforms, however,
not on Windows 64.
|
|
* rickard/rwmutex-bug/OTP-8925:
Use correct argument types on rwlock_wake_set_flags()
|
|
|
|
* rickard/rwmutex-bug/OTP-8925:
Miscellaneous rwmutex bug fixes and improvements
Don't use more reader groups than schedulers
New test suite containing stress tests of the rwmutex implementation
Conflicts:
erts/emulator/beam/erl_init.c
|
|
The ERTS internal rwlock implementation could get
into an inconsistent state. This bug was very seldom
triggered, but could be during heavy contention. The
bug was introduced in R14B (erts-5.8.1).
The bug was most likely to be triggered when using the
read_concurrency option on an ETS table that
was frequently accessed from multiple processes doing
lots of writes and reads. That is, in a situation where
you typically don't want to use the read_concurrency
option in the first place.
|
|
* ta/fix-ethread-void-return:
ethread: do not return from void ethr_atomic_set_relb
OTP-8944
|
|
|
|
Reported-by: Patrick Baggett <[email protected]>
|
|
* sv/ethread-atomic-mips:
add MIPS architecture to GCC ethread atomics support
|
|
|
|
Gcc for MIPS supports immediate atomic gets and sets, and also
supports a working __sync_synchronize() for gcc 4.2 and greater.
|
|
The CPU topology is now automatically detected on Windows
systems with less than 33 logical processors. The runtime system
will now, also on Windows, by default bind schedulers to logical
processors using the 'default_bind' bind type if the amount of
schedulers is at least equal to the amount of logical processors
configured, binding of schedulers is supported, and a CPU topology
is available at startup.
|
|
Calling erlang:system_info/1 with the new argument 'update_cpu_info'
will make the runtime system reread and update the internally stored
CPU information. For more information see the documentation of
erlang:system_info(update_cpu_info).
|
|
Large parts of the ethread library have been rewritten. The
ethread library is an Erlang runtime system internal, portable
thread library used by the runtime system itself.
Most notable improvement is a reader optimized rwlock
implementation which dramatically improve the performance of
read-lock/read-unlock operations on multi processor systems by
avoiding ping-ponging of the rwlock cache lines. The reader
optimized rwlock implementation is used by miscellaneous
rwlocks in the runtime system that are known to be read-locked
frequently, and can be enabled on ETS tables by passing the
`{read_concurrency, true}' option upon table creation. See the
documentation of `ets:new/2' for more information.
The ethread library can now also use the libatomic_ops library
for atomic memory accesses. This makes it possible for the
Erlang runtime system to utilize optimized atomic operations
on more platforms than before. Use the
`--with-libatomic_ops=PATH' configure command line argument
when specifying where the libatomic_ops installation is
located. The libatomic_ops library can be downloaded from:
http://www.hpl.hp.com/research/linux/atomic_ops/
The changed API of the ethread library has also caused
modifications in the Erlang runtime system. Preparations for
the to come "delayed deallocation" feature has also been done
since it depends on the ethread library.
Note: When building for x86, the ethread library will now use
instructions that first appeared on the pentium 4 processor. If
you want the runtime system to be compatible with older
processors (back to 486) you need to pass the
`--enable-ethread-pre-pentium4-compatibility' configure command
line argument when configuring the system.
|
|
Writer preferred pthread read/write locks has been enabled on Linux.
|
|
The number of spinlocks used when implementing atomic fall-backs when no
native atomic implementation is available has been increased from 16 to
1024.
|
|
Support for using gcc's built-in functions for atomic memory access has
been added. This functionallity will be used if available and no other
native atomic implementation in ERTS is available.
|
|
Missing memory barriers in erts_poll() could cause the runtime system to
hang indefinitely.
|
|
* pan/otp_8332_halfword:
Teach testcase in driver_suite the new prototype for driver_async
wx: Correct usage of driver callbacks from wx thread
Adopt the new (R13B04) Nif functionality to the halfword codebase
Support monitoring and demonitoring from driver threads
Fix further test-suite problems
Correct the VM to work for more test suites
Teach {wordsize,internal|external} to system_info/1
Make tracing and distribution work
Turn on instruction packing in the loader and virtual machine
Add the BeamInstr data type for loaded BEAM code
Fix the BEAM dissambler for the half-word emulator
Store pointers to heap data in 32-bit words
Add a custom mmap wrapper to force heaps into the lower address range
Fit all heap data into the 32-bit address range
|
|
Change erl_int_sizes_config to include HALFWORD_HEAP_EMULATOR,
which make it possible for the NIFs to figure out the term size.
|