Age | Commit message (Collapse) | Author |
|
Delay creation of stack frames
|
|
* bjorn/compiler/coverage:
beam_utils: Refactor combine_alloc_lists() to cover more lines
map_SUITE: Cover beam_utils:bif_to_test/3
beam_disasm: Remove support for obsolete instructions
guard_SUITE: Test is_bitstring/1 and is_map/1 on literals
|
|
* maint:
Updated OTP version
Prepare release
ssh: Special treatment of OpenSSH clients >= 7.2 rsa-sha2-* public keys
Conflicts:
OTP_VERSION
|
|
* maint-20:
Updated OTP version
Prepare release
ssh: Special treatment of OpenSSH clients >= 7.2 rsa-sha2-* public keys
|
|
v3_codegen currently wraps a stack frame around each clause in
a function (unless the clause is simple without any 'case' or
other complex constructions).
Consider this function:
f({a,X}) ->
A = abs(X),
case A of
0 ->
{result,"0"};
_ ->
{result,integer_to_list(A)}
end;
f(_) ->
error.
The first clause needs a stack frame because there is a function
call to integer_to_list/1 not in the tail position. v3_codegen
currently wraps the entire first clause in stack frame.
We can delay the creation of the stack frame, and create a
stack frame in each arm of the 'case' (if needed):
f({a,X}) ->
A = abs(X),
case A of
0 ->
%% Don't create a stack frame here.
{result,"0"};
_ ->
%% Create a stack frame here.
{result,integer_to_list(A)}
end;
f(_) ->
error.
There are pros and cons of this approach.
The cons are that the code size may increase if there are many
'case' clauses and each needs its own stack frame. The allocation
instructions may also interfere with other optimizations, but
the new optimizations introduced in previous commits will mitigate
most of those issues.
The pros are the following:
* For some clauses in a 'case', there is no need to create any
stack frame at all.
* Often when moving an allocation instruction into a 'case' clause,
the slightly cheaper 'allocate' instruction can be used instead
of 'allocate_zero'. There is also the possibility that the
allocate instruction can be be combined with a 'test_heap'
instruction.
* Each stack frame for each arm of the 'case' will have exactly as
many slots as needed.
|
|
When rewriting tuple matching of the first element of a tuple to an
is_tagged_tuple instruction, the get_tuple_element instruction that
fetches the tag will be left unless the register that is fetched is
subsequently killed.
We can do better than that. If the register is referenced in an
allocating instruction, but its value is never actually used, we
can do one of two things: if the value is known to be defined earlier
(using annotations added by beam_utils:anno_defs/1) the instruction
can be removed altogether; if not, it can be replaced with a
'move nil TagRegister' instruction.
|
|
Use annotations added by beam_utils:anno_defs/1 to move more
allocations upwards in the instruction stream. That in turn
allows us to optimize away more 'move' instructions.
|
|
To avoid having to call both is_killed/3 and is_not_used/3,
add usage/3 to answer both questions in one call.
|
|
Add beam_utils:anno_defs/1 which will add an annotation to the
beginning of each block indicating which X registers that are
defined. Having that information can improve some optimizations.
|
|
|
|
|
|
|
|
|
|
|
|
Thoose clients signs with sha instead of sha2-*. Try first to verify with the correct one, and if that fails, retry with sha1.
|
|
|
|
|
|
Add -MMD option to erlc
OTP-14830
|
|
digraph: Document a bad_edge error
|
|
There are four uncovered lines in combine_heap_needs/2 and
combine_alloc_lists/2. There is no way to reach starting from
Erlang source code using the standard compiler. However, they
can be reached starting from BEAM assembly code, so we don't
want to remove them.
We could add a test case that covers the lines using assembly
code, but an easier solution is to rewrite the code in a more
generic way using sofs so that the code can be covered with
existing test cases.
|
|
|
|
* bjorn/compiler/use-stacktrace-syntax:
Use the new syntax for retrieving stack traces
|
|
Slightly optimize reading of cooked files in list mode
|
|
01835845579e9 fixed some problems, but introduced a bug where
is_not_used/3 would report that a register was not used when it
in fact was.
|
|
|
|
|
|
758712d6294 changed the need_heap/2 function so that it stopped
using its second argument.
Remove the second argument from need_heap(), and update all callers
to similarly remove unused arguments.
|
|
|
|
|
|
|
|
|
|
|
|
* lars/ssl/update-runtime-dependencies:
[ssl] Update runtime dependencies
|
|
* maint:
ssh: Update runtime dependencies of ssh
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
* bjorn/compiler/cover-v3_codegen:
v3_codegen: Simplify #k_guard_break{}
v3_codegen: Remove uncovered clause in bs_rename_ctx/4
Cover handling of #k_call{} in v3_codegen:bsm_rename_ctx/4
v3_codegen: Move guard_cg_list/6 to a more logical place
v3_codegen: Remove unnecessary clause for handling #cg_block{}
v3_codegen: Remove unnecessary adding of variable to set
|
|
|
|
|
|
In general, the new NIF-based file routines are faster than the old
efile driver.
However, on some computers, building the entire OTP system is somewhat
slower. It turns out that it is because 'erlc' cheated by turning off
the IO thread pool (using '+A0') to avoid context switches between
scheduler threads and threads in the IO thread pool. The new file
routines perform IO on dirty IO threads, and there is (by intent) no
way to force the operations to occur on scheduler threads to avoid
the context switches
What we can do to is to use a small (4Kb) read-ahead buffer for files
opened for reading (only) in list mode (which is how the compiler
opens its input files). The buffering reduces the number of context
switches between scheduler threads and dirty IO threads. On my
computer that seems to slightly speed up building of the entire OTP
system.
The buffer should do no harm. The only potential for harm I can
think of is random access to a file opened in read mode, where
the read-ahead buffer could slightly decrease performance. That
does not seems to be a likely use case in practice, though.
|
|
* hasse/stdlib/base64/OTP-14624:
stdlib: Add base64 benchmarks
stdlib: Do not check base64 input more than needed
stdlib: Minor optimization of base64
stdlib: Use binary_to_list in base64 when it is faster
stdlib: Optimize base64 functions
|
|
|
|
* anders/diameter/typo/OTP-14805:
vsn -> 2.1.3
Update appup for 20.2
Fix doc typo
|
|
It turns out that we don't need to keep track of locked
variables, because the locked variables are always the same
variables that will be alive after a #k_guard_break{}.
|
|
Remove handling of #k_match{} in bsm_rename_ctx/4.
It can never be reached because bsm_rename_ctx/4 will never recurse
into a block that is not in the scope of a #k_protected{}, and
in a #k_protected{}, #k_match{} is not allowed.
|
|
|
|
Put guard_cg_list/6 directly after guard_cg/5.
|
|
The function guard_cg/5 handles constructs found within
the records #k_guard_clause{] and #k_protected{}.
Since #k_guard_clause{} can only contain a #k_protected{},
and #k_protected{} in turn cannot contain a #cg_block{},
the clause for handling #cg_block{} in guard_cg/5 is never
executed and can be removed.
|
|
The variable being added will already be there (added by v3_kernel).
|
|
bjorng/bjorn/compiler/fix-excessive-allocations/ERL-514
Avoid excessive stack frame allocation
OTP-14808
|
|
|