Age | Commit message (Collapse) | Author |
|
Introduce a syntax to mark an operand that is not always used when
an instrution is executed. Example of such operands are the fail
label for is_nil or the number of live registers for an
allocate instruction.
Use a question mark to annotate optional use:
is_nil f? xy
allocate t t?
|
|
|
|
All other instructions that increment the stack pointer takes a 'Q'
operand.
|
|
* bjorn/erts/relative-jumps:
Pack failure labels in i_select_val2 and i_select_tuple_arity2
Optimize i_select_tuple_arity2 and is_select_lins
Rewrite select_val_bins so that its labels can be packed
Pack sequences of trailing 'f' operands
Implement packing of 'f' and 'j'
Make sure that mask literals are 64 bits
Use relative failure labels
Add information about offset to common group start position
Remove JUMP_OFFSET
Refactor instructions to support relative jumps
Introduce a new trace_jump/1 instruction for tracing
Avoid using $Src more than once
|
|
|
|
* bjorn/erts/improve-beam-ops:
Add built-in macros $ARG_POSITION() and $IS_PACKED()
Use 't' instead of 'I' bit syntax operands
Optimize operand type for match context in i_bs_get_integer
Change operand type from 's' to 'S' for a few instructions
Use the correct name of the parameter
|
|
Optimise equality comparisons
|
|
The number of live registers, unit and flags, and the number
of slots all fit comortably in 16 bits. This change will give
more opportunities for packing.
|
|
The match context is always in an X register. Change the 's' operand
to 'x'.
While we are it, also change the operands for Live and FlagsAndUnit
to 't' (they will both fit comfortably in 16 bits).
|
|
'S' is slightly more efficient. Using 'S' may also enable more
packing.
|
|
As a preparation for introducing relative jumps, introduce
"trace_jump W" that can be used for tracing. This instruction
will continue to have an absolute address for the jump target.
(Note: This instruction is never created during loading; it
is only created in stubs when tracing is active.)
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
try_case_end is an excepting-generating instruction that is
infrequently executed.
|
|
Instructions that used to be implemented in beam_emu.c
were not marked as cold as it would make no difference.
|
|
The bit syntax instructions are mixed among other instructions
in beam_hot.h and beam_cold.h.
Introduce a new hotness level called '%warm' with is associated
file beam_warm.h. Mark all bit syntax instructions as '%warm'.
|
|
The type 'd' could be used both for destination registers and
source register.
Restrict the 'd' type to only be used for destinations, and
introduce the new 'S' type to be used when a source must be
a register.
|
|
The 'I' type can be replaced with 't' if we know that the value
will fit in 16 bits. The 'P' type can be replaced with 'Q' if
it is used for deallocating a stack frame.
|
|
Having the helper functions for map update knowing all the details
of operands for the instruction will make it difficult to make
improvements such as better packing.
|
|
As a preparation for potentially improving packing in the future,
we will need to make sure that packable types have a defined maximum
size.
The packer algorithm assumes that two 'I' operands can be packed
into one 64-bit word, but there are instructions that use an 'I'
operand to store a pointer. It only works because those instructions
are not packed for other reasons.
Introduce the 'W' type and use it for operands that don't fit in
32 bits.
|
|
The transformations were incorrect.
|
|
The instruction put_map_assoc/5 (used for updating a map) has a
failure operand, but it can't actually fail provided that its "map"
argument is a map.
The following code:
M#{key=>value}.
will be compiled to:
{test,is_map,{f,3},[{x,0}]}.
{line,[...]}.
{put_map_assoc,{f,0},{x,0},{x,0},1,{list,[{atom,key},{atom,value}]}}.
return.
{label,3}.
%% Code that produces a 'badmap' exception follows.
Because of the is_map instruction, {x,0} always contains a map when
the put_map_assoc instruction is executed. Therefore we can remove
the failure operand. That will save one word, and also eliminate
two tests at run-time.
The only problem is that the compiler in OTP 17 did not emit a
is_map instruction before the put_map_assoc instruction. Therefore,
we must add an instruction that tests for a map if the code was
compiled with the OTP 17 compiler.
Unfortunately, there is no safe and relatively easy way to known that
the OTP 17 compiler was used, so we will check whether a compiler
before OTP 20 was used. OTP 20 introduced a new chunk type for atoms,
which is trivial to check.
|
|
|
|
Eliminate the need to write pre-processor macros for each instruction.
Instead allow the implementation of instruction to be written in
C directly in the .tab files. Rewrite all existing macros in this
way and remove the %macro directive.
|
|
Take advantage of the fact that small maps have a tuple for keys.
When new map is constructed and all keys are literals, we can construct
the entire keys tuple as a literal.
This should reduce the memory of maps created with literal keys almost by half,
since they all can share the same keys tuple.
|
|
Bit syntax instructions never store their result in a Y register.
Therefore, change the bit syntax instructions to use 'x' as the
destination instead of 'd'. That will simplify the code that stores
the result, and will be a slight reduction in code size and execution
time.
|
|
|
|
a3407eaa2104d6 eliminated the -gen_dest flag for macros in ops.tab.
It turns out that the new implementation (taking the address of the
X or X destination register) is unsafe if the destination is a Y
register and there can be a GC. The problem is that the address to
the Y register will change if there is a GC.
Fortunately, the few instructions in OTP 20 that have a general
destinations are safe. The put_list_ssd instruction never does a GC.
The bit syntax instructions that may do a GC will always store the
result to an X register.
To be completely sure, rewrite the destination register from 'd' to
'x' for the bit syntax instructions. That means that a bit syntax
instruction with a Y register destionation will abort the loading
if it is encountered.
|
|
|
|
Inroduce syntactic sugar so that we can write:
get_list xy xy xy
instead of:
get_list x x x
get_list x x y
get_list x y x
get_list x y y
get_list y x x
get_list y x y
get_list y y x
get_list y y y
|
|
Instructions that take a 'd' argument needs a -gen_dest flag in their
macros. For example:
%macro:put_list PutList -pack -gen_dest
put_list s s d
-gen_dest was needed when x(0) was stored in a register, since it is
not possible to take the address of a register. Now that x(0) is stored
in memory and we can take the address, we can eliminate gen_dest.
|
|
Rewrite the instruction stream on tagged tuple tests.
Tagged tuples means a tuple of any arity with an atom as its first element.
Typically records, ok-tuples and error-tuples.
from:
...
{test,is_tuple,Fail,[Src]}.
{test,test_arity,Fail,[Src,Sz]}.
...
{get_tuple_element,Src,0,Dst}.
...
{test,is_eq_exact,Fail,[Dst,Atom]}.
...
to:
...
{test,is_tagged_tuple,Fail,[Src,Sz,Atom]}.
...
|
|
|
|
* kvakvs/erts/gc_minor_option/OTP-11695:
erts: Fix req_system_task gc typespec
Fix process_SUITE system_task_blast and no_priority_inversion2
Option to erlang:garbage_collect to request minor (generational) GC
Conflicts:
erts/emulator/beam/erl_process.c
erts/preloaded/src/erts_internal.erl
|
|
|
|
Note: Minor GC option is a hint, and GC may still decide to run fullsweep.
Test case for major and minor gc on self
Test case for major and minor gs on some other process + async gc test check
docs fix
|
|
* bjorn/compiler/misc-opt:
v3_kernel: Construct literal lists properly
Use the register map in %live in beam_utils:is_killed_block/2
Teach beam_utils to check liveness for put_map instructions
beam_peep: Help out beam_jump
|
|
* bjorn/erts/beam_load:
Optimize get_tuple_element instructions that target Y registers
Mend beam_SUITE:packed_registers/1
Correct unpacking of 3 operands on 32-bit archictectures
Eliminate misleading #ifdef ARCH_64 in beam_opcodes.h
beam_debug: Correct masking when unpacking packed operands
|
|
Use cerl:make_list/1 instead of a home-made make_list/1 to ensure that
literal lists are constructed as literals. In a future release, we
would like to forbid in the loader construction of literal lists using
instructions like:
put_list {atom,a} [] Dst
The proper way is:
move {literal,[a]} {x,0}
Also update the comment about "put_list Const [] Dst" in ops.tab.
|
|
Several improvements in the compiler (e.g. c288ab87fd6) has
lead to an Y register being the target for get_tuple_element
instructions. Therefore, introduce i_get_tuple_element2y
that combines two consecutive get_tuple_element instructions
that target Y registers.
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
The raise/2 instruction is almost always used like this:
raise x(2) x(1)
Therefore, we can translate it to an internal i_raise/0
instruction that uses x(2) x(1) as its implicit operands.
We will also remove the backward compatibility with R10-0. It is
unlikely that anyone still is using BEAM files compiled with the R10-0
compiler, especially since most of those modules cannot be loaded. The
loader will refuse to load any module that uses the old non-GCIng
arithmetic instructions or the non-GCing versions of length/1 or
size/1.
Doing these changes will reduce both the size of the loaded BEAM
code and size of the code in process_main().
|
|
There is no reason to rename bs_put_utf16/3.
(We rename instructions if we'll need to change the operands or
if we will need to avoid an endless transformation loop. Neither
of these reasons apply to bs_put_utf16/3.)
|
|
Optimizations that are possible to do by the compiler should be
done by the compiler and not by the loader.
If the compiler has done its job correctly, attempting to do the two
transformations only wastes time.
|
|
The transformation on the following line will do the job.
|
|
62473daf introduced an unsafe optimization in the loader.
See the comments in the test case for an explanation of
the problem.
|
|
|
|
The perf_counter is a very very cheap and high resolution timer
that can be used to timestamp system events. It does not have
monoticity guarantees, but should on most OS's expose a monotonous
time.
A special instruction has been created for this counter to further
speed up fetching it.
OTP-12908
|
|
* egil/pd-opt-get/OTP-13167:
erts: Add i_get_hash instruction
erts: Use internal hash for process dictionaries
|
|
Calculate hashvalue in load-time for constant process dictionary gets.
|
|
The combination is_non_empty_list followed by get_list is extremly
common (but not in estone_SUITE, which is why it has not been noticed
before). Therefore it is worthwile to introduce a combined
instruction.
|