Age | Commit message (Collapse) | Author |
|
|
|
The max_heap_size process flag can be used to limit the
growth of a process heap by killing it before it becomes
too large to handle. It is possible to set the maximum
using the `erl +hmax` option, `system_flag(max_heap_size, ...)`,
`spawn_opt(Fun, [{max_heap_size, ...}])` and
`process_flag(max_heap_size, ...)`.
It is possible to configure the behaviour of the process
when the maximum heap size is reached. The process may be
sent an untrappable exit signal with reason kill and/or
send an error_logger message with details on the process
state. A new trace event called gc_max_heap_size is
also triggered for the garbage_collection trace flag
when the heap grows larger than the configured size.
If kill and error_logger are disabled, it is still
possible to see that the maximum has been reached by
doing garbage collection tracing on the process.
The heap size is defined as the sum of the heap memory
that the process is currently using. This includes
all generational heaps, the stack, any messages that
are considered to be part of the heap and any extra
memory the garbage collector may need during collection.
In the current implementation this means that when a process
is set using on_heap message queue data mode, the messages
that are in the internal message queue are counted towards
this value. For off_heap, only matched messages count towards
the size of the heap. For mixed, it depends on race conditions
within the VM whether a message is part of the heap or not.
Below is an example run of the new behaviour:
Eshell V8.0 (abort with ^G)
1> f(P),P = spawn_opt(fun() -> receive ok -> ok end end, [{max_heap_size, 512}]).
<0.60.0>
2> erlang:trace(P, true, [garbage_collection, procs]).
1
3> [P ! lists:duplicate(M,M) || M <- lists:seq(1,15)],ok.
ok
4>
=ERROR REPORT==== 26-Apr-2016::16:25:10 ===
Process: <0.60.0>
Context: maximum heap size reached
Max heap size: 512
Total heap size: 723
Kill: true
Error Logger: true
GC Info: [{old_heap_block_size,0},
{heap_block_size,609},
{mbuf_size,145},
{recent_size,0},
{stack_size,9},
{old_heap_size,0},
{heap_size,211},
{bin_vheap_size,0},
{bin_vheap_block_size,46422},
{bin_old_vheap_size,0},
{bin_old_vheap_block_size,46422}]
flush().
Shell got {trace,<0.60.0>,gc_start,
[{old_heap_block_size,0},
{heap_block_size,233},
{mbuf_size,145},
{recent_size,0},
{stack_size,9},
{old_heap_size,0},
{heap_size,211},
{bin_vheap_size,0},
{bin_vheap_block_size,46422},
{bin_old_vheap_size,0},
{bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,gc_max_heap_size,
[{old_heap_block_size,0},
{heap_block_size,609},
{mbuf_size,145},
{recent_size,0},
{stack_size,9},
{old_heap_size,0},
{heap_size,211},
{bin_vheap_size,0},
{bin_vheap_block_size,46422},
{bin_old_vheap_size,0},
{bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,exit,killed}
|
|
|
|
All 'EXIT' and monitor messages are sent from 'system'
Timeouts are "sent" from 'clock_service'
|
|
|
|
|
|
* bjorn/erts/beam_load:
Optimize get_tuple_element instructions that target Y registers
Mend beam_SUITE:packed_registers/1
Correct unpacking of 3 operands on 32-bit archictectures
Eliminate misleading #ifdef ARCH_64 in beam_opcodes.h
beam_debug: Correct masking when unpacking packed operands
|
|
Any heap fragment created during a nif call to a tracer nif
should be free'd immediately in order for the GC not to treat
it as live data.
|
|
erts_block/unblock_fpe should only be called at entry to/exit from
native user code.
|
|
Add the possibility to use modules as trace data receivers. The functions
in the module have to be nifs as otherwise complex trace probes will be
very hard to handle (complex means trace probes for ports for example).
This commit changes the way that the ptab->tracer field works from always
being an immediate, to now be NIL if no tracer is present or else be
the tuple {TracerModule, TracerState} where TracerModule is an atom that
is later used to lookup the appropriate tracer callbacks to call and
TracerState is just passed to the tracer callback. The default process and
port tracers have been rewritten to use the new API.
This commit also changes the order which trace messages are delivered to the
potential tracer process. Any enif_send done in a tracer module may be delayed
indefinitely because of lock order issues. If a message is delayed any other
trace message send from that process is also delayed so that order is preserved
for each traced entity. This means that for some trace events (i.e. send/receive)
the events may come in an unintuitive order (receive before send) to the
trace receiver. Timestamps are taken when the trace message is generated so
trace messages from differented processes may arrive with the timestamp
out of order.
Both the erlang:trace and seq_trace:set_system_tracer accept the new tracer
module tracers and also the backwards compatible arguments.
OTP-10267
|
|
Several improvements in the compiler (e.g. c288ab87fd6) has
lead to an Y register being the target for get_tuple_element
instructions. Therefore, introduce i_get_tuple_element2y
that combines two consecutive get_tuple_element instructions
that target Y registers.
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
The raise/2 instruction is almost always used like this:
raise x(2) x(1)
Therefore, we can translate it to an internal i_raise/0
instruction that uses x(2) x(1) as its implicit operands.
We will also remove the backward compatibility with R10-0. It is
unlikely that anyone still is using BEAM files compiled with the R10-0
compiler, especially since most of those modules cannot be loaded. The
loader will refuse to load any module that uses the old non-GCIng
arithmetic instructions or the non-GCing versions of length/1 or
size/1.
Doing these changes will reduce both the size of the loaded BEAM
code and size of the code in process_main().
|
|
There is no reason to rename bs_put_utf16/3.
(We rename instructions if we'll need to change the operands or
if we will need to avoid an endless transformation loop. Neither
of these reasons apply to bs_put_utf16/3.)
|
|
|
|
|
|
This is mostly a pure refactoring.
Except for the buggy cases when calling erlang:halt() with a positive
integer in the range -(INT_MIN+2) to -INT_MIN that got confused with
ERTS_ABORT_EXIT, ERTS_DUMP_EXIT and ERTS_INTR_EXIT.
Outcome OLD erl_exit(n, ) NEW erts_exit(n, )
------- ------------------- -------------------------------------------
exit(Status) n = -Status <= 0 n = Status >= 0
crashdump+abort n > 0, ignore n n = ERTS_ERROR_EXIT < 0
The outcome of the old ERTS_ABORT_EXIT, ERTS_INTR_EXIT and
ERTS_DUMP_EXIT are the same as before (even though their values have
changed).
|
|
* jv/erts/optimize-cmp:
Unify comparison macros in erl_utils.h
Avoid erts_cmp jump in atom, int and float comparisons
|
|
Given the function definition below:
check(X) when X >= 0, X <= 20 -> true.
@nox has originally noticed that perfoming lt and ge
guard tests were performing slower than they should be.
Further investigation revealed that most of the cost
was in jumping to the erts_cmp function. This patch
brings the operations already inlined in erts_cmp
into the emulator, removing the jump cost.
After applying these changes, invoking the check/1
function defined above 30000 times with different
values from 0 to 20 has fallen from 367us to 213us
(measured as average of 3 runs). This is a
considerably improvement over Erlang 18 which takes
556us on average.
Floats have also dropped their time from 1126us
(on Erlang 18) to 613us.
|
|
* lukas/erts/msacc:
Update preloaded modules
erts: Make msacc alloctor type thread safe
Silence compiler
erts: Fix msacc testcase on some windowses
erts: Add power saving cpu feature tests and use them
erts: Refactor perf counter internal interface
erts: Add rdtscp instruction check
erts: Fix hrtime for windows
erts: use correct function for perf counter on non-x86
erts: Fix msacc win32 debug compile error
erts: Add microstate accounting
erts, kernel: Add os:perf_counter function
erts: Add ERTS_WRITE_UNLIKELY
|
|
Conflicts:
erts/emulator/beam/beam_emu.c
|
|
perf counter is now part of the function pointer interface
and also the function returns the value instead of writing
to a memory buffer.
|
|
|
|
Microstate accounting is a way to track which state the
different threads within ERTS are in. The main usage area
is to pin point performance bottlenecks by checking which
states the threads are in and then from there figuring out
why and where to optimize.
Since checking whether microstate accounting is on or off is
relatively expensive if done in a short loop only a few of the
states are enabled by default and more states can be enabled
through configure.
I've done some benchmarking and the overhead with it turned off
is not noticible and with it on it is a fraction of a percent.
If you enable the extra states, depending on the benchmark,
the ovehead when turned off is about 1% and when turned on
somewhere inbetween 5-15%.
OTP-12345
|
|
The perf_counter is a very very cheap and high resolution timer
that can be used to timestamp system events. It does not have
monoticity guarantees, but should on most OS's expose a monotonous
time.
A special instruction has been created for this counter to further
speed up fetching it.
OTP-12908
|
|
* egil/pd-opt-get/OTP-13167:
erts: Add i_get_hash instruction
erts: Use internal hash for process dictionaries
|
|
* rickard/ohmq-fixup/OTP-13047:
Replace off_heap_message_queue option with message_queue_data option
Always use literal_alloc
Distinguish between GC disabled by BIFs and other disabled GC
Fix process_info(_, off_heap_message_queue)
Off heap message queue test suite
Remove unused variable
Fix memory leaks
|
|
Processes remember heap fragments that are known to be fully
live due to creation in a just called BIF that yields in the
live_hf_end field. This field must not be used if we have not
disabled GC in a BIF. F_DELAY_GC has been introduced in order
to distinguish between to two different scenarios.
- F_DISABLE_GC should *only* be used by BIFs. This when
the BIF needs to yield while preventig a GC.
- F_DELAY_GC should only be used when GC is temporarily
disabled while the process is scheduled. A process must
not be scheduled out while F_DELAY_GC is set.
|
|
Calculate hashvalue in load-time for constant process dictionary gets.
|
|
The test whether the result would fit in a smallnum could overflow into
a negative number that would fit a smallnum. A test that reproduces the
issue was added to bs_construct_SUITE.
|
|
|
|
|
|
* rickard/gc-bump-reds/OTP-13097:
Bump reductions on GC
|
|
* rickard/gc-after-bif-cond/OTP-13098:
Use the same conditions when triggering GC after BIF
|
|
* rickard/ohmq/OTP-13047:
Fragmented young heap generation and off_heap_message_queue option
Refactor GC
Introduce literal tag
Conflicts:
erts/doc/src/erlang.xml
erts/emulator/beam/erl_gc.c
|
|
* sverk/literal-memory-range:
erts: Refactor line table in loaded beam code
erts: Refactor header of loaded beam code
fix check_process_code for separate literal area
erts: Add support for fast erts_is_literal()
erts: Refactor erl_mmap to allow several mapper instances
erts: Add new allocator LITERAL
erts: Fix strangeness in treatment of MSEG_ALIGN_BITS
erts: Cleanup main carrier creation
erts: Remove unused erts_have_erts_mmap
erts: Refactor config test for posix_memalign
|
|
|
|
|
|
* The youngest generation of the heap can now consist of multiple
blocks. Heap fragments and message fragments are added to the
youngest generation when needed without triggering a GC. After
a GC the youngest generation is contained in one single block.
* The off_heap_message_queue process flag has been added. When
enabled all message data in the queue is kept off heap. When
a message is selected from the queue, the message fragment (or
heap fragment) containing the actual message is attached to the
youngest generation. Messages stored off heap is not part of GC.
|
|
to use a real C struct instead of array.
|
|
|
|
erlang:is_builtin(erlang, apply, 3) returns 'false'. That seems to be
an oversight in the implementation of erlang:is_builtin/3 rather than
a conscious design decision. Part of apply/3 is implemented in C (as a
special instruction), and part of it in Erlang (only used if apply/3
is used recursively). That makes apply/3 special compared to all other
BIFs.
From the viewpoint of the user, apply/3 is a built-in function,
since it cannot possibly be implemented in pure Erlang.
Noticed-by: Stavros Aronis
|
|
|
|
Conflicts:
OTP_VERSION
erts/doc/src/notes.xml
erts/vsn.mk
lib/runtime_tools/doc/src/notes.xml
lib/runtime_tools/vsn.mk
otp_versions.table
|
|
|
|
Fetch the head and tail parts to temporary variables before
writing them to their destinations. That should allow the CPU to
perform the moves in parallel, which might improve performance.
|
|
The combination is_non_empty_list followed by get_list is extremly
common (but not in estone_SUITE, which is why it has not been noticed
before). Therefore it is worthwile to introduce a combined
instruction.
|
|
|
|
It is currently only possible to pack up to 4 operands. However,
the move_window4 instrucion has 5 operands and move_window5 and
move3 instrucations have 6 operands.
Teach beam_makeops to pack instructions with 5 or 6 operands.
Also rewrite the move_window instructions in beam_emu.c to macros
to allow their operands to get packed.
|
|
Since 'd' operands can only either an X register or an Y register,
we only need a single bit to distinguish them. Furthermore, we can
pre-multiply the register number with the word size to speed up
address calculation.
|