Age | Commit message (Collapse) | Author |
|
Compile external fun expressions to literals
OTP-15003
|
|
The expressions fun M:F/A, when all elements are literals are also
treated as a literal. Since they have consistent representation and
don't depend on the code currently loaded in the VM, this is safe.
This can provide significant performance improvements in code using such
functions extensively - a full function call to erlang:make_fun/3 is
replaced by a single move instruction and no register shuffling or
saving registers to stack is necessary. Additionally, compound data
types that contain such external functions as elements can be treated as
literals too.
The commit also changes the representation of external funs to be a
valid Erlang syntax and adds support for literal external funs to core
Erlang.
|
|
A get_map_elements instruction that has a literal map operand
would never be translated to a i_get_map_element instruction.
That would be a problem for the following instruction:
get_map_elements Fail #{} {x,0}, {x,1}
Since the key is not a literal, get_map_element must be used,
since get_map_elements requires that a hash value can be
calculated for each element.
When the instruction is translated to i_get_map_element, the
hash value will be set to 0 and an assertion will trigger in
the debug build.
|
|
Instructions that produce more than one result complicate
optimizations. get_list/3 is one of two instructions that
produce multiple results (get_map_elements/3 is the other).
Introduce the get_hd/2 and get_tl/2 instructions
that return the head and tail of a cons cell, respectively,
and use it internally in all optimization passes.
For efficiency, we still want to use get_list/3 if both
head and tail are used, so we will translate matching pairs
of get_hd and get_tl back to get_list instructions.
|
|
Consider the following function:
function({function,Name,Arity,CLabel,Is0}, Lc0) ->
try
%% Optimize the code for the function.
catch
Class:Error:Stack ->
io:format("Function: ~w/~w\n", [Name,Arity]),
erlang:raise(Class, Error, Stack)
end.
The stacktrace is retrieved, but it is only used in the call
to erlang:raise/3. There is no need to build a stacktrace
in this function. We can avoid the building if we introduce
an instruction called raw_raise/3 that works exactly like
the erlang:raise/3 BIF except that its third argument must
be a raw stacktrace.
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
X register 0 used to be mapped to a hardware register, and therefore
faster than the other registers. Because of that, the compiler
tried to use x(0) as much as possible as a temporary register.
That was changed a few releases ago. X register 0 is now placed
in the array of all X registers and has no special speed
advantage compared to the other registers.
Remove the code in the compiler that attempts to use x(0) as
much as possible. As a result, the following type of instruction
will be much less frequent:
{put_list,Src,{x,0},{x,0}}
Instead, the following type of instruction will be more frequent:
{put_list,Src,{x,X},{x,X}}
(Where X is an arbitrary X register.)
Update the runtime system to specialize that kind of put_list
instruction.
|
|
When a ref is created before performing a receive that will only
receive message containing that ref, there is a compiler optimization
to avoid scanning messages that can't possible contain the newly
created ref.
Magnus Lång pointed out that the implementation of the optimization
is flawed. Exceptions or recursive calls could cause the receive
operation to scan the receive queue from a position beyond the expected
message (that is, the message containing the ref would never be
matched out). See the receive_opt_exception/1 and receive_opt_recursion/1
test cases in receive_SUITE.
It turns out that we can simplify the implementation of the
optimization while fixing the bug (suggested by Magnus Lång). We
actually don't need the c_p->msg.mark field. It is enough to have
c_p->msg.saved_pos; if it is non-zero, it is a valid position in the
message qeueue. All we need to do is to ensure that we clear
c_p->msg.saved_pos when a receive is exited normally or abnormally.
We can clear c_p->msg.saved_pos in JOIN_MESSAGE(), since it is called
both when leaving a receive because a message matched and because there
was a timeout and the 'after' clause was executed. In addition, we
need to clear c_p->msg.saved_pos when an exception is caught.
https://bugs.erlang.org/browse/ERL-511
|
|
Don't allow defining an specific operation more than once with
the exact same operands. Don't allow a specific operation to be
defined with different arities.
|
|
The try_end and try_case instructions are implemented the same way
(try_case is translated to try_end by the loader).
We can do better than that. We know that try_case will only be executed
when an exception has been caught. Therefore, we know that x(0) is
the non-value and that x(1) through x(3) need to be shifted down to
x(0) through x(2). There is no need to test x(0) before shifting down.
try_end does not need the register shifting code at all.
|
|
Introduce a syntax to mark an operand that is not always used when
an instrution is executed. Example of such operands are the fail
label for is_nil or the number of live registers for an
allocate instruction.
Use a question mark to annotate optional use:
is_nil f? xy
allocate t t?
|
|
|
|
All other instructions that increment the stack pointer takes a 'Q'
operand.
|
|
* bjorn/erts/relative-jumps:
Pack failure labels in i_select_val2 and i_select_tuple_arity2
Optimize i_select_tuple_arity2 and is_select_lins
Rewrite select_val_bins so that its labels can be packed
Pack sequences of trailing 'f' operands
Implement packing of 'f' and 'j'
Make sure that mask literals are 64 bits
Use relative failure labels
Add information about offset to common group start position
Remove JUMP_OFFSET
Refactor instructions to support relative jumps
Introduce a new trace_jump/1 instruction for tracing
Avoid using $Src more than once
|
|
|
|
* bjorn/erts/improve-beam-ops:
Add built-in macros $ARG_POSITION() and $IS_PACKED()
Use 't' instead of 'I' bit syntax operands
Optimize operand type for match context in i_bs_get_integer
Change operand type from 's' to 'S' for a few instructions
Use the correct name of the parameter
|
|
Optimise equality comparisons
|
|
The number of live registers, unit and flags, and the number
of slots all fit comortably in 16 bits. This change will give
more opportunities for packing.
|
|
The match context is always in an X register. Change the 's' operand
to 'x'.
While we are it, also change the operands for Live and FlagsAndUnit
to 't' (they will both fit comfortably in 16 bits).
|
|
'S' is slightly more efficient. Using 'S' may also enable more
packing.
|
|
As a preparation for introducing relative jumps, introduce
"trace_jump W" that can be used for tracing. This instruction
will continue to have an absolute address for the jump target.
(Note: This instruction is never created during loading; it
is only created in stubs when tracing is active.)
|
|
* In both loader and compiler, make sure constants are always the second
operand - many passes of the compiler assume that's always the case.
* In loader rewrite is_eq_exact with same arguments to skip the instruction
and with different constants move one to an x register to maintain
the properly outlined above.
* The same (but in reverse) is done with the is_ne_exact, where we rewrite
to an unconditional jump or add a move to an x register.
* All of the above allow to replace is_eq_exact_fss with is_eq_exact_fyy and
is_ne_exact_fss with is_ne_exact_fSS as those are the only possibilities left.
|
|
try_case_end is an excepting-generating instruction that is
infrequently executed.
|
|
Instructions that used to be implemented in beam_emu.c
were not marked as cold as it would make no difference.
|
|
The bit syntax instructions are mixed among other instructions
in beam_hot.h and beam_cold.h.
Introduce a new hotness level called '%warm' with is associated
file beam_warm.h. Mark all bit syntax instructions as '%warm'.
|
|
The type 'd' could be used both for destination registers and
source register.
Restrict the 'd' type to only be used for destinations, and
introduce the new 'S' type to be used when a source must be
a register.
|
|
The 'I' type can be replaced with 't' if we know that the value
will fit in 16 bits. The 'P' type can be replaced with 'Q' if
it is used for deallocating a stack frame.
|
|
Having the helper functions for map update knowing all the details
of operands for the instruction will make it difficult to make
improvements such as better packing.
|
|
As a preparation for potentially improving packing in the future,
we will need to make sure that packable types have a defined maximum
size.
The packer algorithm assumes that two 'I' operands can be packed
into one 64-bit word, but there are instructions that use an 'I'
operand to store a pointer. It only works because those instructions
are not packed for other reasons.
Introduce the 'W' type and use it for operands that don't fit in
32 bits.
|
|
The transformations were incorrect.
|
|
The instruction put_map_assoc/5 (used for updating a map) has a
failure operand, but it can't actually fail provided that its "map"
argument is a map.
The following code:
M#{key=>value}.
will be compiled to:
{test,is_map,{f,3},[{x,0}]}.
{line,[...]}.
{put_map_assoc,{f,0},{x,0},{x,0},1,{list,[{atom,key},{atom,value}]}}.
return.
{label,3}.
%% Code that produces a 'badmap' exception follows.
Because of the is_map instruction, {x,0} always contains a map when
the put_map_assoc instruction is executed. Therefore we can remove
the failure operand. That will save one word, and also eliminate
two tests at run-time.
The only problem is that the compiler in OTP 17 did not emit a
is_map instruction before the put_map_assoc instruction. Therefore,
we must add an instruction that tests for a map if the code was
compiled with the OTP 17 compiler.
Unfortunately, there is no safe and relatively easy way to known that
the OTP 17 compiler was used, so we will check whether a compiler
before OTP 20 was used. OTP 20 introduced a new chunk type for atoms,
which is trivial to check.
|
|
|
|
Eliminate the need to write pre-processor macros for each instruction.
Instead allow the implementation of instruction to be written in
C directly in the .tab files. Rewrite all existing macros in this
way and remove the %macro directive.
|
|
Take advantage of the fact that small maps have a tuple for keys.
When new map is constructed and all keys are literals, we can construct
the entire keys tuple as a literal.
This should reduce the memory of maps created with literal keys almost by half,
since they all can share the same keys tuple.
|
|
Bit syntax instructions never store their result in a Y register.
Therefore, change the bit syntax instructions to use 'x' as the
destination instead of 'd'. That will simplify the code that stores
the result, and will be a slight reduction in code size and execution
time.
|
|
|
|
a3407eaa2104d6 eliminated the -gen_dest flag for macros in ops.tab.
It turns out that the new implementation (taking the address of the
X or X destination register) is unsafe if the destination is a Y
register and there can be a GC. The problem is that the address to
the Y register will change if there is a GC.
Fortunately, the few instructions in OTP 20 that have a general
destinations are safe. The put_list_ssd instruction never does a GC.
The bit syntax instructions that may do a GC will always store the
result to an X register.
To be completely sure, rewrite the destination register from 'd' to
'x' for the bit syntax instructions. That means that a bit syntax
instruction with a Y register destionation will abort the loading
if it is encountered.
|
|
|
|
Inroduce syntactic sugar so that we can write:
get_list xy xy xy
instead of:
get_list x x x
get_list x x y
get_list x y x
get_list x y y
get_list y x x
get_list y x y
get_list y y x
get_list y y y
|
|
Instructions that take a 'd' argument needs a -gen_dest flag in their
macros. For example:
%macro:put_list PutList -pack -gen_dest
put_list s s d
-gen_dest was needed when x(0) was stored in a register, since it is
not possible to take the address of a register. Now that x(0) is stored
in memory and we can take the address, we can eliminate gen_dest.
|
|
Rewrite the instruction stream on tagged tuple tests.
Tagged tuples means a tuple of any arity with an atom as its first element.
Typically records, ok-tuples and error-tuples.
from:
...
{test,is_tuple,Fail,[Src]}.
{test,test_arity,Fail,[Src,Sz]}.
...
{get_tuple_element,Src,0,Dst}.
...
{test,is_eq_exact,Fail,[Dst,Atom]}.
...
to:
...
{test,is_tagged_tuple,Fail,[Src,Sz,Atom]}.
...
|
|
|
|
* kvakvs/erts/gc_minor_option/OTP-11695:
erts: Fix req_system_task gc typespec
Fix process_SUITE system_task_blast and no_priority_inversion2
Option to erlang:garbage_collect to request minor (generational) GC
Conflicts:
erts/emulator/beam/erl_process.c
erts/preloaded/src/erts_internal.erl
|
|
|
|
Note: Minor GC option is a hint, and GC may still decide to run fullsweep.
Test case for major and minor gc on self
Test case for major and minor gs on some other process + async gc test check
docs fix
|
|
* bjorn/compiler/misc-opt:
v3_kernel: Construct literal lists properly
Use the register map in %live in beam_utils:is_killed_block/2
Teach beam_utils to check liveness for put_map instructions
beam_peep: Help out beam_jump
|
|
* bjorn/erts/beam_load:
Optimize get_tuple_element instructions that target Y registers
Mend beam_SUITE:packed_registers/1
Correct unpacking of 3 operands on 32-bit archictectures
Eliminate misleading #ifdef ARCH_64 in beam_opcodes.h
beam_debug: Correct masking when unpacking packed operands
|
|
Use cerl:make_list/1 instead of a home-made make_list/1 to ensure that
literal lists are constructed as literals. In a future release, we
would like to forbid in the loader construction of literal lists using
instructions like:
put_list {atom,a} [] Dst
The proper way is:
move {literal,[a]} {x,0}
Also update the comment about "put_list Const [] Dst" in ops.tab.
|
|
Several improvements in the compiler (e.g. c288ab87fd6) has
lead to an Y register being the target for get_tuple_element
instructions. Therefore, introduce i_get_tuple_element2y
that combines two consecutive get_tuple_element instructions
that target Y registers.
|