Age | Commit message (Collapse) | Author |
|
|
|
Seen on SSL application where substraction with x registers were prevalent:
* i_minus specialization on x registers
* i_plus specialization on x registers
|
|
Common pattern seen in SSL:
move y x | move r x -> move2
move r x | move y x -> move2
Common pattern seen in SSL and Compiler:
move x r | move x x -> move2
|
|
* i_rem specialization on x registers
|
|
* i_band specialization on x registers and constants
|
|
Move an entire region of x registers to the stack.
This reduces the dispatch pressure of move instructions.
Also introduce a move2 specialization for some common move patterns:
move r y | move x y -> move2 : As above, moving regions to the stack
move x r | move x y -> move2 : A seemingly common pattern
|
|
|
|
* i_is_lt for r, x registers and constants
* i_is_ge for x registers and constants
* i_is_exact_eq for r and x registers
|
|
See the previous commit for justification and use cases.
|
|
Let the loader pre-compute the hash value when a single, literal key
is matched as in:
#{<<"some_key">>:=V} = Map
In my measurements, this optimization resulted in a 30 percent
speedup for short binary keys.
Unfortunately, this optimizization makes no difference for small
maps with less than 32 keys, since the hash value is not used.
Still, there are the following use cases:
* A map used instead of a record with more than 32 entries. I have
seen some applications with huge records.
* Lookup in JSON dictionaries represented as maps.
The hash value will only be used when the map is a hash map
(currently, that means at least 32 entries).
|
|
In the i_get_map_element/4 instruction, for literal keys other than
atoms, the key would be put into x[0] instead of used directly in the
instruction. The reason is that the original implementation of maps
only supported atom keys.
|
|
The map instructions require that the keys in the instructions
are sorted (for flatmaps). But that is an implementation detail
that should not exposed outside of the BEAM virtual machine.
Therefore, make the sorting of the keys the responsibility of
the loader and not the compiler.
Also note that the sort order for maps with numeric keys or keys
with numeric components has changed in OTP 18. That means that
code compiled for OTP 17 that operated on maps with map keys
might not work in OTP 18 without the sorting in the loader
(although it is unlikely to be an issue in practice).
|
|
The has_map_fields instruction is infrequently used. Thus there
is no need to have the fastest possible implementation; it is
better to have an implementation that reduces the code size in
the already big process_main() function.
We can transform has_map_fields to a get_map_elements instruction,
targeting the same unused x[0] register for all keys. That
instruction will only be marginally slower than existing
implementation.
|
|
The compiler will only emit is_map/1 instructions with literal
argument if optimization is turned off. Therefore, the only
reason for this commit is cleanliness.
|
|
The new_map instruction cannot fail, and thus needs no fail label.
|
|
A put_map_assoc instruction with an empty source map should be
converted to a simpler new_map instruction. The transformation
didn't happen because an empty source map is no longer represented
as a NIL term (as it was in the beginning before map literals
were implemented).
|
|
Using the exact operator (':=') is only allowed when an existing map
is being updated. Thus the following causes a compilation error:
#{k:=v}
Therefore there is no need to support the put_map_exact instruction
without a source map.
|
|
For searching a key in an array we use linear search in arrays
up to 10 elements.
Selecting on tuple arity will always use linear search.
Instead of using two different instructions we assume selecting on
different tuple arities are always few in numbers.
|
|
|
|
Move src to a register if it is a literal.
|
|
|
|
Map source may be anything, not only registers.
|
|
|
|
|
|
To make it possible to build the entire OTP system, also define
dummys for the instructions in ops.tab.
|
|
A process requesting a system task to be executed in the context of
another process will be notified by a message when the task has
executed. This message will be on the form:
{RequestType, RequestId, Pid, Result}.
A process requesting a system task to be executed can set priority
on the system task. The requester typically set the same priority
on the task as its own process priority, and by this avoiding
priority inversion. A request for execution of a system task is
made by calling the statically linked in NIF
erts_internal:request_system_task(Pid, Prio, Request). This is an
undocumented ERTS internal function that should remain so. It
should *only* be called from BIF implementations.
Currently defined system tasks are:
* garbage_collect
* check_process_code
Further system tasks can and will be implemented in the future.
The erlang:garbage_collect/[1,2] and erlang:check_process_code/[2,3]
BIFs are now implemented using system tasks. Both the
'garbage_collect' and the 'check_process_code' operations perform
or may perform garbage_collections. By doing these via the
system task functionality all garbage collect operations in the
system will be performed solely in the context of the process
being garbage collected. This makes it possible to later implement
functionality for disabling garbage collection of a process over
context switches.
Newly introduced BIFs:
* erlang:garbage_collect/2 - The new second argument is an option
list. Introduced option:
* {async, RequestId} - making it possible for users to issue
asynchronous garbage collect requests.
* erlang:check_process_code/3 - The new third argument is an
option list. Introduced options:
* {async, RequestId} - making it possible for users to issue
asynchronous check process code requests.
* {allow_gc, boolean()} - making it possible to issue requests
that aren't allowed to garbage collect (operation will abort
if gc should be needed).
These options have been introduced as a preparation for
parallelization of check_process_code operations when the
code_server is about to purge a module.
|
|
|
|
The loader failed to load non-optimized BEAM code generated from:
element(2, not_a_tuple)
Commit ece4c17d2288a3161c995 introduced such code into
core_fold_SUITE, leading to core_fold_no_opt_SUITE and
core_fold_post_opt_SUITE failing to load.
|
|
Calls to erlang:set_trace_pattern/3 will no longer block all
other schedulers.
We will still go to single-scheduler mode when new code is loaded
for a module that is traced, or when loading code when there is a
default trace pattern set. That is not impossible to fix, but that
requires much closer cooperation between tracing BIFs and the loader
BIFs.
|
|
Change the data structures for breakpoints to make it possible
(in a future commit) to manage breakpoints without taking down the
system to single-scheduling mode.
The current "breakpoint wheel" data structure (a circular,
double-linked list of breakpoints) was invented before the
SMP emulator. To support it in the SMP emulator, there is essentially
one breakpoint wheel per scheduler. As more breakpoint types have
been added, the implementation has become messy and hard to understand
and maintain.
Therefore, the time for a rewrite has come. Use one struct to hold
all breakpoint data for a breakpoint in a function. Use a flag field
to indicate what different type of break actions that are enabled.
|
|
|
|
|
|
Conflicts:
erts/emulator/beam/beam_emu.c
erts/emulator/beam/bif.tab
erts/preloaded/ebin/prim_file.beam
lib/hipe/cerl/erl_bif_types.erl
|
|
|
|
|
|
It is wrongly assumed in the BEAM loader that apply/2 is a BIF
and must be treated specially. Also make it clearer in ops.tab
that apply/3 is a BIF, but apply/2 is not.
|
|
Taking advantage of the new calling convention for BIFs, we
only need one instruction to handle BIFs with any number of
arguments. This change eliminates the limit of three arguments
for BIFs, but traps are still limited to three arguments.
|
|
Each gc_bif[123] instruction must have both a transformation in
ops.tab and special code in gen_guard_bif[123]().
Rewrite it to do most of the work in gen_guard_bif[123]().
|
|
In the handling of generic instructions, we used to always
test whether the instruction was 'too_old_compiler' and abort
loading with a special error message.
Refactor the code so that we only do test if we an error
has occurred. That will allow us to make the test more expensive
in the future, allowing us to customize error messages for certain
opcode without any cost in the successful case.
|
|
We don't need type constraints that essentially are assertions;
the wrong type will be detected and loading aborted when no specific
instruction can be found.
|
|
If the left part of a transformation will always match, omit the
the 'try_me_else' and 'fail' instructions.
As part of this optimization, make it an error to have a
transformation that can never be reached because of a previous
transformation that will always match. (Remove one transformation
from ops.tab that was found to be unreachable.)
|
|
is_list/2 and other test instructions with a zero label was last
generated by the v1 BEAM compiler which was last supported in R6B.
Since BEAM modules produced by that compiler will be rejected with
a nice error message for other reasons (e.g. by the test for the
module_info/0,1 functions), retaining those transformations serves
no useful purpose.
|
|
The hybrid-heap emulator is broken since R12, so there is no
need to keep those instructions.
|
|
Ancient versions of BEAM compiler could generate move instruction with
the same source and destination registers, so the loader would optimize
away such instructions.
|
|
|
|
|
|
Introduce the line/1 instruction in the compiler and the BEAM
virtual machine. It will not yet be generated by the compiler and
will not actually carry any information.
|
|
Constructing binaries using the bit syntax with literals sizes
that would not fit in an Uint will either cause an emulator crash
or the loading to be aborted.
Use the new TAG_o tag introduced in the previous commit to make sure
that the attempt to create huge binary literals will generate a
system_limit exception at run-time.
|
|
|
|
For some historical reason, the transformation of a func_info/3
instruction to the internal i_func_info/4 instruction is more
involved than it needs to be. Remove the gen_func_info() function
in the loader and use a simple transformation.
|