Age | Commit message (Collapse) | Author |
|
|
|
Can still not setup -a, but cerl works.
|
|
Commit 8781932b3b8769b6f208ac7c00471122ec7dd055 broke
erlang:system_info(system_version) in the non-SMP emulator so
that it would typically dump core.
"rq:%d" was removed from the format string for erts_printf(),
but the corresponding argument in the argument list was not removed,
which ultimately caused "kernel-poll:%s" to be passed an integer
(typically 0).
|
|
* rickard/rm-common-runq/OTP-9727:
Remove common run-queue in SMP case
Fix scheduler suspend bug
Conflicts:
erts/emulator/beam/erl_init.c
|
|
The common run-queue implementation is removed since it is unused,
untested, undocumented, unsupported, and only complicates the code.
A spinlock used by the run-queue management sometimes got heavily
contended. This code has now been rewritten, and the spinlock
has been removed.
|
|
* rickard/generic-thr-queue/OTP-9632:
Use generic lock-free queue for async threads
Use generic lock-free queue for misc aux work
Implement generic lock-free queue
|
|
* rickard/thr-progress-block/OTP-9631:
Replace system block with thread progress block
|
|
* rickard/alloc-opt/OTP-7775:
Optimize memory allocation
Conflicts:
erts/aclocal.m4
erts/emulator/hipe/hipe_bif_list.m4
erts/preloaded/ebin/erl_prim_loader.beam
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/init.beam
erts/preloaded/ebin/otp_ring0.beam
erts/preloaded/ebin/prim_file.beam
erts/preloaded/ebin/prim_inet.beam
erts/preloaded/ebin/prim_zip.beam
erts/preloaded/ebin/zlib.beam
|
|
Queues used for communication between async threads and scheduler threads
have been replaced with lock-free queues.
Drivers using the driver_async functionality are not automatically locked
to the system anymore, and can be unloaded as any dynamically linked in
driver.
Scheduling of ready async jobs is now also interleaved in between other
jobs. Previously all ready async jobs was performed at once.
|
|
The ERTS internal system block functionality has been replaced by
new functionality for blocking the system. The old system block
functionality had contention issues and complexity issues. The
new functionality piggy-backs on thread progress tracking functionality
needed by newly introduced lock-free synchronization in the runtime
system. When the functionality for blocking the system isn't used
there is more or less no overhead at all. This since the functionality
for tracking thread progress is there and needed anyway.
|
|
A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
instances instead of one instance. Apart from reducing contention
this also ensures that memory allocators always create memory
segments on the local NUMA node on a NUMA system.
|
|
* sverk/bif-args/OTP-9662:
erts,hipe: Limited support for hipe cross compilation
erts-hipe: Change THE_NON_VALUE for HiPE enabled debug emulator
erts-hipe: Enable debug compiled hipe-VM with lock checker
erts-hipe: Rename fail_bif_interface_0 to standard_bif_interface_0
erts-hipe: Deliberate leak of native fun entries
erts-hipe: Fix new trap conventions for x86, amd64 and ppc
Store the trap address in p->i
Store the trap arguments in the X register array
erts-hipe: Make some primops use new BIF calling convention
erts-hipe: Adapt generated BIF wrappers for new calling convention
erts-hipe: Remove obscuring macros in generated assembler code
erts-hipe: Make hipe enabled emulator compile with new BIF calls
Simplify the instructions for calling BIFs
Change the calling convention for BIFs
Use the proper macros in all BIFs
Conflicts:
erts/emulator/beam/bif.h
erts/emulator/beam/erl_bif_info.c
|
|
As a preparation for changing the calling convention for
BIFs, make sure that all BIFs use the macros. Also, eliminate
all calls from one BIF to another, since that also breaks
the calling convention abstraction.
|
|
Pre-building the tuples returned by os:type/0 (system_info(os_type))
and os:version() (system_info(os_version)) saves some run-time and
avoids building garbage in the calling process.
|
|
|
|
All uses of the old deprecated atomic API in the runtime system
have been replaced with the use of the new atomic API. In a lot of
places this change imply a relaxation of memory barriers used.
|
|
The ethread atomics API now also provide double word size atomics.
Double word size atomics are implemented using native atomic
instructions on x86 (when the cmpxchg8b instruction is available)
and on x86_64 (when the cmpxchg16b instruction is available). On
other hardware where 32-bit atomics or word size atomics are
available, an optimized fallback is used; otherwise, a spinlock,
or a mutex based fallback is used.
The ethread library now performs runtime tests for presence of
hardware features, such as for example SSE2 instructions, instead
of requiring this to be determined at compile time.
There are now functions implementing each atomic operation with the
following implied memory barrier semantics: none, read, write,
acquire, release, and full. Some of the operation-barrier
combinations aren't especially useful. But instead of filtering
useful ones out, and potentially miss a useful one, we implement
them all.
A much smaller set of functionality for native atomics are required
to be implemented than before. More or less only cmpxchg and a
membar macro are required to be implemented for each atomic size.
Other functions will automatically be constructed from these. It is,
of course, often wise to implement more that this if possible from a
performance perspective.
|
|
The io_list_len() function returns an int, where a negative return
value indicates a type error. One problem is that an int only consists
of 32 bits in a 64-bit emulator. Changing the return type to Sint
will solve that problem, but in the 32-bit emulator, a large iolist
and a iolist with a type error will both return a negative number.
(Noticed by Jon Meredith.)
Another problem is that for iolists whose total size exceed the
word size, the result would be truncated, leading to a subsequent
buffer overflow and emulator crash.
Therefore, introduce the new erts_iolist_size() function which
returns a status indication and writes the result size through
a passed pointer. If the result size does not fit in a word,
return an overflow indication.
|
|
* sverker/erts_printf-halfword:
erts_printf %be to print integers of size Eterm
Fix use of type BeamInstr in hipe_debug.c
Conflicts:
erts/emulator/hipe/hipe_debug.c
|
|
|
|
Existing %bp to print pointer size integers does not work in halfword
emulator to print Eterm size integers.
|
|
|
|
The new_binary() function takes a size argument that is an
int. In the 64-bit emulator (sizeof(int) == 4, sizeof(Uint) == 8),
any sizes >= 0x8000000 become 0xffffffff80000000 and above and
triggers a memory allocation failure.
Change the type of the size argument to Uint, and change any
callers that cast the argument to an int.
Correction-by: Jon Meredith
|
|
|
|
|
|
The compressed format is using a slighty modified variant of the extern format
(term_to_binary). To not worsen key lookup's too much, the top tuple itself
and the key element are not compressed. Table objects with only immediate
non-key elements will therefor not gain anything (but actually consume one
extra word for "alloc_size").
|
|
* rickard/cpu-groups/OTP-8861:
Generalize reader groups
Move cpu topology functionality into erl_cpu_topology.[ch]
Do not use more reader groups for schedulers than schedulers
Conflicts:
erts/emulator/beam/erl_init.c
|
|
|
|
* pg/fix-system_info-cpu_topology-segfault:
Fix crash with erlang:system_info({cpu_topology,junk})
OTP-8914
|
|
Id: OTP-8912
This patch creates a new family of flags with the "+z" prefix. It
further creates a new configuration option called "dbbl" (which is the
first letter of the name dist_buf_busy_limit). Example usage of this
flag would be "+zdbbl 1048576".
This patch creates an adjustable buffer limit for the amount of data
that may be buffered by the erlang distribution code (in dist.c
specifically). Before this patch, this hard-coded constant was used:
#define ERTS_DE_BUSY_LIMIT (128*1024)
When large binaries are transmitted between nodes (or simply a lot of
medium-sized binaries), it is very easy to hit the old 128KB limit.
Processes that use the erlang:system_monitor() BIF to monitor system
events can be spammed by {monitor, busy_dist_port, ...} message tuples
at rates of tens to even hundreds of messages/second.
A larger buffer limit will allow processes to buffer more outgoing
messages over the distribution. When the buffer limit has been
reached, sending processes will be suspended until the buffer size has
shrunk. The buffer limit is per distribution channel. A higher limit
will give lower latency and higher throughput at the expense of
higher memory usage.
A variation of this patch has been in commercial production use in at
least two companies that the author is aware of. Larger buffer values
can reduce the number of {monitor, busy_dist_port, ...} system
messages drastically, lower overall messaging latencies, and prevent
false timeouts and 'nodedown' messages in extremely busy Mnesia systems.
Test suite: there are two tests:
a. In erlexec_SUITE.erl to test basic set & get of the value
b. In distribution_SUITE.erl, to verify that setting +zdbbl very
low will actually change behavior.
|
|
There is a bug in system_info BIF causing a crash if
erts_get_cpu_topology_term fails. The fix comes with a non-regression
test.
|
|
Added erlang:system_info(build_type) which makes it
easier to chose drivers, NIF libraries, etc based
on build type of the runtime system.
|
|
* rickard/cpu-info/OTP-8765:
Initialize environment functionality after thread lib
Fix faulty assertions
Implement automatic detection of CPU topology on Windows
Make it possible to reread and update detected CPU information
|
|
Calling erlang:system_info/1 with the new argument 'update_cpu_info'
will make the runtime system reread and update the internally stored
CPU information. For more information see the documentation of
erlang:system_info(update_cpu_info).
|
|
elib_malloc is an alternate memory allocator that
is no longer possible to build.
|
|
* rickard/ethread-rewrite/OTP-8544:
Rewrite ethread library
|
|
Large parts of the ethread library have been rewritten. The
ethread library is an Erlang runtime system internal, portable
thread library used by the runtime system itself.
Most notable improvement is a reader optimized rwlock
implementation which dramatically improve the performance of
read-lock/read-unlock operations on multi processor systems by
avoiding ping-ponging of the rwlock cache lines. The reader
optimized rwlock implementation is used by miscellaneous
rwlocks in the runtime system that are known to be read-locked
frequently, and can be enabled on ETS tables by passing the
`{read_concurrency, true}' option upon table creation. See the
documentation of `ets:new/2' for more information.
The ethread library can now also use the libatomic_ops library
for atomic memory accesses. This makes it possible for the
Erlang runtime system to utilize optimized atomic operations
on more platforms than before. Use the
`--with-libatomic_ops=PATH' configure command line argument
when specifying where the libatomic_ops installation is
located. The libatomic_ops library can be downloaded from:
http://www.hpl.hp.com/research/linux/atomic_ops/
The changed API of the ethread library has also caused
modifications in the Erlang runtime system. Preparations for
the to come "delayed deallocation" feature has also been done
since it depends on the ethread library.
Note: When building for x86, the ethread library will now use
instructions that first appeared on the pentium 4 processor. If
you want the runtime system to be compatible with older
processors (back to 486) you need to pass the
`--enable-ethread-pre-pentium4-compatibility' configure command
line argument when configuring the system.
|
|
Merging the three off-heap lists (binaries, funs and externals) into
one list. This reduces memory consumption by two words (pointers) per
ETS object.
|
|
erlang:system_info(snifs) lists all static native implemented
functions. The function presents the lists with three tuple
values containing MFAs [{Module, Function, Arity}, ...].
|
|
New NIF features:
Send messages from a NIF, or from thread created by NIF, to any local
process (enif_send)
Store terms between NIF calls (enif_alloc_env, enif_make_copy)
Create binary terms with user defined memory management
(enif_make_resource_binary)
|
|
Set loop factors to 10.
Teach erts_debug:set_internal_state to limit loop factor for binary.
Add random tests for matches and match with multiple searchstrings.
|
|
Fix safe_mul in the loader, which caused failures in the bit
syntax test cases.
Fix yet another Uint in erl_alloc.h (ERTS_CACHE_LINE_SIZE) causing
segmentation fault when we have many schedulers (why only in that
situation?).
Clean up erl_mseg (remove old code for the Linux 32-bit mmap flag).
While at it, also remove compilation warnings.
|
|
Some test suites need to differentiate between 32-bit terms
and 32-bit pointers.
While at it, remove some more warnings in process.c for SMP and debug.
|
|
For cleanliness, use BeamInstr instead of the UWord
data type to any machine-sized words that are used
for BEAM instructions. Only use UWord for untyped
words in general.
|
|
Store Erlang terms in 32-bit entities on the heap, expanding the
pointers to 64-bit when needed. This works because all terms are stored
on addresses in the 32-bit address range (the 32 most significant bits
of pointers to term data are always 0).
Introduce a new datatype called UWord (along with its companion SWord),
which is an integer having the exact same size as the machine word
(a void *), but might be larger than Eterm/Uint.
Store code as machine words, as the instructions are pointers to
executable code which might reside outside the 32-bit address range.
Continuation pointers are stored on the 32-bit stack and hence must
point to addresses in the low range, which means that loaded beam code
much be placed in the low 32-bit address range (but, as said earlier,
the instructions themselves are full words).
No Erlang term data can be stored on C stacks (enforced by an
earlier commit).
This version gives a prompt, but test cases still fail (and dump core).
The loader (and emulator loop) has instruction packing disabled.
The main issues has been in rewriting loader and actual virtual
machine. Subsystems (like distribution) does not work yet.
|
|
This is the first step in the implementation of the half-word emulator,
a 64-bit emulator where all pointers to heap data will be stored
in 32-bit words. Code specific for this emulator variant is
conditionally compiled when the HALFWORD_HEAP define has
a non-zero value.
First force all pointers to heap data to fall into a single 32-bit range,
but still store them in 64-bit words.
Temporary term data stored on C stack is moved into scheduler specific
storage (allocated as heaps) and macros are added to make this
happen only in emulators where this is needed. For a vanilla VM the
temporary terms are still stored on the C stack.
|
|
Add erts_debug:lock_counters({copy_save, bool()}). This option
enables or disables statistics saving for destroyed processes and
ets-tables. Enabling this might consume a lot of memory.
Add id-numbering for lock classes which is otherwise undefined.
|
|
The garbage collector in r13b03 is too aggressive in some cases. This
commit raises the level of default initial allowed binary garbage
(virtual heap for binaries) before collecting from 233 words to
46368 words (181 kB on 32-bit).
A new option, min_bin_vheap_size, has been added to spawn_opt,
system_flag and process_flag can be used to change the default values.
The option can also be used with system_info and process_info to
inspect the values.
For symmetry the option min_heap_size has been added to the above
functions where it was previously missing.
Add testcases for min_bin_vheap_size and min_heap_size for
functions process_flag/2, process_info/2, system_info/2 and
spawn_opt/2.
|
|
fragments was created. This will mainly benefit NIFs that return
large compound terms.
|
|
|