Age | Commit message (Collapse) | Author |
|
* bjorn/erts/beam_load:
Optimize get_tuple_element instructions that target Y registers
Mend beam_SUITE:packed_registers/1
Correct unpacking of 3 operands on 32-bit archictectures
Eliminate misleading #ifdef ARCH_64 in beam_opcodes.h
beam_debug: Correct masking when unpacking packed operands
|
|
Any heap fragment created during a nif call to a tracer nif
should be free'd immediately in order for the GC not to treat
it as live data.
|
|
erts_block/unblock_fpe should only be called at entry to/exit from
native user code.
|
|
Add the possibility to use modules as trace data receivers. The functions
in the module have to be nifs as otherwise complex trace probes will be
very hard to handle (complex means trace probes for ports for example).
This commit changes the way that the ptab->tracer field works from always
being an immediate, to now be NIL if no tracer is present or else be
the tuple {TracerModule, TracerState} where TracerModule is an atom that
is later used to lookup the appropriate tracer callbacks to call and
TracerState is just passed to the tracer callback. The default process and
port tracers have been rewritten to use the new API.
This commit also changes the order which trace messages are delivered to the
potential tracer process. Any enif_send done in a tracer module may be delayed
indefinitely because of lock order issues. If a message is delayed any other
trace message send from that process is also delayed so that order is preserved
for each traced entity. This means that for some trace events (i.e. send/receive)
the events may come in an unintuitive order (receive before send) to the
trace receiver. Timestamps are taken when the trace message is generated so
trace messages from differented processes may arrive with the timestamp
out of order.
Both the erlang:trace and seq_trace:set_system_tracer accept the new tracer
module tracers and also the backwards compatible arguments.
OTP-10267
|
|
Several improvements in the compiler (e.g. c288ab87fd6) has
lead to an Y register being the target for get_tuple_element
instructions. Therefore, introduce i_get_tuple_element2y
that combines two consecutive get_tuple_element instructions
that target Y registers.
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
The raise/2 instruction is almost always used like this:
raise x(2) x(1)
Therefore, we can translate it to an internal i_raise/0
instruction that uses x(2) x(1) as its implicit operands.
We will also remove the backward compatibility with R10-0. It is
unlikely that anyone still is using BEAM files compiled with the R10-0
compiler, especially since most of those modules cannot be loaded. The
loader will refuse to load any module that uses the old non-GCIng
arithmetic instructions or the non-GCing versions of length/1 or
size/1.
Doing these changes will reduce both the size of the loaded BEAM
code and size of the code in process_main().
|
|
There is no reason to rename bs_put_utf16/3.
(We rename instructions if we'll need to change the operands or
if we will need to avoid an endless transformation loop. Neither
of these reasons apply to bs_put_utf16/3.)
|
|
|
|
|
|
This is mostly a pure refactoring.
Except for the buggy cases when calling erlang:halt() with a positive
integer in the range -(INT_MIN+2) to -INT_MIN that got confused with
ERTS_ABORT_EXIT, ERTS_DUMP_EXIT and ERTS_INTR_EXIT.
Outcome OLD erl_exit(n, ) NEW erts_exit(n, )
------- ------------------- -------------------------------------------
exit(Status) n = -Status <= 0 n = Status >= 0
crashdump+abort n > 0, ignore n n = ERTS_ERROR_EXIT < 0
The outcome of the old ERTS_ABORT_EXIT, ERTS_INTR_EXIT and
ERTS_DUMP_EXIT are the same as before (even though their values have
changed).
|
|
* jv/erts/optimize-cmp:
Unify comparison macros in erl_utils.h
Avoid erts_cmp jump in atom, int and float comparisons
|
|
Given the function definition below:
check(X) when X >= 0, X <= 20 -> true.
@nox has originally noticed that perfoming lt and ge
guard tests were performing slower than they should be.
Further investigation revealed that most of the cost
was in jumping to the erts_cmp function. This patch
brings the operations already inlined in erts_cmp
into the emulator, removing the jump cost.
After applying these changes, invoking the check/1
function defined above 30000 times with different
values from 0 to 20 has fallen from 367us to 213us
(measured as average of 3 runs). This is a
considerably improvement over Erlang 18 which takes
556us on average.
Floats have also dropped their time from 1126us
(on Erlang 18) to 613us.
|
|
* lukas/erts/msacc:
Update preloaded modules
erts: Make msacc alloctor type thread safe
Silence compiler
erts: Fix msacc testcase on some windowses
erts: Add power saving cpu feature tests and use them
erts: Refactor perf counter internal interface
erts: Add rdtscp instruction check
erts: Fix hrtime for windows
erts: use correct function for perf counter on non-x86
erts: Fix msacc win32 debug compile error
erts: Add microstate accounting
erts, kernel: Add os:perf_counter function
erts: Add ERTS_WRITE_UNLIKELY
|
|
Conflicts:
erts/emulator/beam/beam_emu.c
|
|
perf counter is now part of the function pointer interface
and also the function returns the value instead of writing
to a memory buffer.
|
|
|
|
Microstate accounting is a way to track which state the
different threads within ERTS are in. The main usage area
is to pin point performance bottlenecks by checking which
states the threads are in and then from there figuring out
why and where to optimize.
Since checking whether microstate accounting is on or off is
relatively expensive if done in a short loop only a few of the
states are enabled by default and more states can be enabled
through configure.
I've done some benchmarking and the overhead with it turned off
is not noticible and with it on it is a fraction of a percent.
If you enable the extra states, depending on the benchmark,
the ovehead when turned off is about 1% and when turned on
somewhere inbetween 5-15%.
OTP-12345
|
|
The perf_counter is a very very cheap and high resolution timer
that can be used to timestamp system events. It does not have
monoticity guarantees, but should on most OS's expose a monotonous
time.
A special instruction has been created for this counter to further
speed up fetching it.
OTP-12908
|
|
* egil/pd-opt-get/OTP-13167:
erts: Add i_get_hash instruction
erts: Use internal hash for process dictionaries
|
|
* rickard/ohmq-fixup/OTP-13047:
Replace off_heap_message_queue option with message_queue_data option
Always use literal_alloc
Distinguish between GC disabled by BIFs and other disabled GC
Fix process_info(_, off_heap_message_queue)
Off heap message queue test suite
Remove unused variable
Fix memory leaks
|
|
Processes remember heap fragments that are known to be fully
live due to creation in a just called BIF that yields in the
live_hf_end field. This field must not be used if we have not
disabled GC in a BIF. F_DELAY_GC has been introduced in order
to distinguish between to two different scenarios.
- F_DISABLE_GC should *only* be used by BIFs. This when
the BIF needs to yield while preventig a GC.
- F_DELAY_GC should only be used when GC is temporarily
disabled while the process is scheduled. A process must
not be scheduled out while F_DELAY_GC is set.
|
|
Calculate hashvalue in load-time for constant process dictionary gets.
|
|
The test whether the result would fit in a smallnum could overflow into
a negative number that would fit a smallnum. A test that reproduces the
issue was added to bs_construct_SUITE.
|
|
|
|
|
|
* rickard/gc-bump-reds/OTP-13097:
Bump reductions on GC
|
|
* rickard/gc-after-bif-cond/OTP-13098:
Use the same conditions when triggering GC after BIF
|
|
* rickard/ohmq/OTP-13047:
Fragmented young heap generation and off_heap_message_queue option
Refactor GC
Introduce literal tag
Conflicts:
erts/doc/src/erlang.xml
erts/emulator/beam/erl_gc.c
|
|
* sverk/literal-memory-range:
erts: Refactor line table in loaded beam code
erts: Refactor header of loaded beam code
fix check_process_code for separate literal area
erts: Add support for fast erts_is_literal()
erts: Refactor erl_mmap to allow several mapper instances
erts: Add new allocator LITERAL
erts: Fix strangeness in treatment of MSEG_ALIGN_BITS
erts: Cleanup main carrier creation
erts: Remove unused erts_have_erts_mmap
erts: Refactor config test for posix_memalign
|
|
|
|
|
|
* The youngest generation of the heap can now consist of multiple
blocks. Heap fragments and message fragments are added to the
youngest generation when needed without triggering a GC. After
a GC the youngest generation is contained in one single block.
* The off_heap_message_queue process flag has been added. When
enabled all message data in the queue is kept off heap. When
a message is selected from the queue, the message fragment (or
heap fragment) containing the actual message is attached to the
youngest generation. Messages stored off heap is not part of GC.
|
|
to use a real C struct instead of array.
|
|
|
|
erlang:is_builtin(erlang, apply, 3) returns 'false'. That seems to be
an oversight in the implementation of erlang:is_builtin/3 rather than
a conscious design decision. Part of apply/3 is implemented in C (as a
special instruction), and part of it in Erlang (only used if apply/3
is used recursively). That makes apply/3 special compared to all other
BIFs.
From the viewpoint of the user, apply/3 is a built-in function,
since it cannot possibly be implemented in pure Erlang.
Noticed-by: Stavros Aronis
|
|
|
|
Conflicts:
OTP_VERSION
erts/doc/src/notes.xml
erts/vsn.mk
lib/runtime_tools/doc/src/notes.xml
lib/runtime_tools/vsn.mk
otp_versions.table
|
|
|
|
Fetch the head and tail parts to temporary variables before
writing them to their destinations. That should allow the CPU to
perform the moves in parallel, which might improve performance.
|
|
The combination is_non_empty_list followed by get_list is extremly
common (but not in estone_SUITE, which is why it has not been noticed
before). Therefore it is worthwile to introduce a combined
instruction.
|
|
|
|
It is currently only possible to pack up to 4 operands. However,
the move_window4 instrucion has 5 operands and move_window5 and
move3 instrucations have 6 operands.
Teach beam_makeops to pack instructions with 5 or 6 operands.
Also rewrite the move_window instructions in beam_emu.c to macros
to allow their operands to get packed.
|
|
Since 'd' operands can only either an X register or an Y register,
we only need a single bit to distinguish them. Furthermore, we can
pre-multiply the register number with the word size to speed up
address calculation.
|
|
Sequences of three move instructionst that effectively swap the
contents of two registers are fairly common. We can replace them
with a swap_temp/3 instruction. The third operand is the temporary
register to be used for swapping, since the temporary register
may actually be used.
If swap_temp/3 instruction is followed by a call, the temporary
register will often (but not always) be killed by the call. If
it is killed, we can replace the swap_temp/3 instruction with a
slightly cheaper swap/2 instruction.
|
|
Currently, move2/2 does the two moves sequentially to ensure
that the instruction will always work correctly.
We can do better than that. If the two move instructions have
any registers in common, we can introduce simpler and slightly
more efficient instructions to handle those cases:
move_shift/3
move_dup/3
For the remaining cases when the the move instructions
have no common registers, the move2/4 instruction can perform
the moves in parallel which is probably slightly more efficient.
For clarity's sake, we will remain the instruction to move2_par/4.
|
|
|
|
The 'cmd' variable that were shared by several hipe_mode_switch
instructions would cause clang to produce sub-optimal code,
probably because it considered the instructions as part of of
loop that needed to be optimized.
What would was that 'cmd' would be assigned to the ESI register
(lower 32 bits of the RSI register). It would use ESI for other
purposes in instructions, but at the end of every instruction
it would set ESI to 1 just in case the next instruction happened
to be hipe_trap_return. This can be seen clearly if this commit
is omitted and the define HIPE_MODE_SWITCH_CMD_RETURN in
hipe/hipe_mode_switch.h is changed from 1 to some other number
such as 42. You will see that 42 is assigned to ESI at the end
of every instruction.
Eliminate this problem by elimininating the shared 'cmd' variable.
|
|
|
|
|