Age | Commit message (Collapse) | Author |
|
Argument order can prevent the delayed sub binary creation.
Here is an example directly from the Efficiency Guide:
non_opt_eq([H|T1], <<H,T2/binary>>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.
When compiling with the bin_opt_info option, there will be a
suggestion to change the argument order.
It turns out sys_core_bsm can itself change the order, not the
order of the arguments of themselves, but the order in which
the arguments are matched. Here is how it can be rewritten in
pseudo Core Erlang code:
non_opt_eq(Arg1, Arg2) ->
case < Arg2,Arg1 > of
< <<H1,T2/binary>>, [H2|T1] > when H1 =:= H2 ->
non_opt_eq(T1, T2);
< <<_,T2/binary>ffff>, [_|T1] > ->
false;
< <<>>, [] >> ->
true
end.
When rewritten like this, the bs_start_match2 instruction will be
the first instruction in the function and it will be possible to
store the match context in the same register as the binary
({x,1} in this case) and to delay the creation of sub binaries.
The switching of matching order also enables many other simplifications
in sys_core_bsm, since there is no longer any need to pass the position
of the pattern as an argument.
We will update the Efficiency Guide in a separate branch before the
release of OTP 21.
|
|
* bjorn/compiler/cuddle-with-tests:
beam_match_SUITE: Eliminate warnings for unused variables
bs_match_SUITE: Add tests case written when walking into a dead end
|
|
* maint:
stdlib: Correct a filelib test case
stdlib: Let filelib:find_source() search subdirs
|
|
* hasse/stdlib/find_src/OTP-14832/ERL-527:
stdlib: Correct a filelib test case
stdlib: Let filelib:find_source() search subdirs
|
|
* maint:
ssl: Call clean version function
|
|
* ingela/ssl/test-cuddle:
ssl: Call clean version function
|
|
Make sure tests are run with intended version settings.
|
|
|
|
'john/runtime_tools/reduce-sysinfo-to_file-memory-use/OTP-14816' into maint
|
|
* maint:
crypto: Disable RSA sslv23 padding for LibreSSL >= 2.6.1
|
|
* hans/crypto/no_rsa_pad_sym/ERL-546/OTP-14873:
crypto: Disable RSA sslv23 padding for LibreSSL >= 2.6.1
|
|
Conflicts:
lib/observer/src/crashdump_viewer.erl
|
|
Not supported in newer LibreSSL.
|
|
|
|
Add some tests cases written when attempting some new optimizations
that turned out to be unsafe.
|
|
* siri/cdv/many-links/OTP-14725:
[observer] Improve performance for many links or monitors
|
|
OTP:13713: Add documentation and typespecs for inet:i/0
|
|
Run beam_block a second time
|
|
Clean up and improve sys_core_fold optimizations
|
|
Refactor '%live' and '%def' annotations
|
|
|
|
* ingela/ssl/rc4-suites/OTP-14871:
ssl: Correct function for listing RC4 suites
|
|
* maint:
Fix encoding of filenames in stacktraces
|
|
* rickard/file-encoding-stacktraces/OTP-14847/ERL-544:
Fix encoding of filenames in stacktraces
|
|
|
|
* maint:
Do not add -lz to LIBS; keep it in Z_LIB
|
|
* rickard/libs-libz/ERL-529/OTP-14840:
Do not add -lz to LIBS; keep it in Z_LIB
|
|
|
|
Running beam_block again after the other optimizations have run will
give it more opportunities for optimizations. In particular, more
allocate_zero/2 instructions can be turned into allocate/2
instructions, and more get_tuple_element/3 instructions can store the
retrieved value into the correct register at once.
Out of a sample of about 700 modules in OTP, 64 modules were improved
by this commit.
|
|
If possible, when adding move/2 instructions, try to insert
them into a block. That could potentially allow them to
be optimized.
|
|
beam_utils:live_opt/1 is currently only run early (from
beam_block). Prepare it to be run after beam_split when
instructions with failure labels have been taken out of
blocks.
While we are it, also improve check_liveness/3. That will
improve the optimizations in beam_record (replacing tuple
matching instructions with an is_tagged_tuple instruction).
|
|
Since the select_val instruction never transfer directly to the next
instruction, the incoming live registers should be ignored. This
bug have not caused any problems yet, but it will in the future
if we are to run the liveness optimizations again after
the optimizations in beam_dead and beam_jump.
|
|
In a guard, reorder two consecutive calls to the element/2 BIF that
access the same tuple and have the same failure label so that highest
index is fetched first. That will allow the second element/2 to be
replace with the slightly cheaper get_tuple_element/3 instruction.
|
|
Consider a 'case' that exports variables and whose return
value is ignored:
foo(N) ->
case N of
1 ->
Res = one;
2 ->
Res = two
end,
{ok,Res}.
That code will be translated to the following Core Erlang code:
'foo'/1 =
fun (_@c0) ->
let <_@c5,Res> =
case _@c0 of
<1> when 'true' ->
<'one','one'>
<2> when 'true' ->
<'two','two'>
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The exported variables has been rewritten to explicit return
values. Note that the original return value from the 'case' is bound to
the variable _@c5, which is unused.
The corresponding BEAM assembly code looks like this:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,1}}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,1}}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,2}.
{put_tuple,2,{x,0}}.
{put,{atom,ok}}.
{put,{x,1}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
Because of the test_heap instruction following label 5, the assignment
to {x,0} cannot be optimized away by the passes that optimize BEAM assembly
code.
Refactor the optimizations of 'let' in sys_core_fold to eliminate the
unused variable. Thus:
'foo'/1 =
fun (_@c0) ->
let <Res> =
case _@c0 of
<1> when 'true' ->
'one'
<2> when 'true' ->
'two'
<_@c3> when 'true' ->
primop 'match_fail'({'case_clause',_@c3})
end
in
{'ok',Res}
The resulting BEAM code will look like:
{function, foo, 1, 2}.
{label,1}.
{line,[...]}.
{func_info,{atom,t},{atom,foo},1}.
{label,2}.
{test,is_integer,{f,6},[{x,0}]}.
{select_val,{x,0},{f,6},{list,[{integer,2},{f,3},{integer,1},{f,4}]}}.
{label,3}.
{move,{atom,two},{x,0}}.
{jump,{f,5}}.
{label,4}.
{move,{atom,one},{x,0}}.
{label,5}.
{test_heap,3,1}.
{put_tuple,2,{x,1}}.
{put,{atom,ok}}.
{put,{x,0}}.
{move,{x,1},{x,0}}.
return.
{label,6}.
{line,[...]}.
{case_end,{x,0}}.
|
|
Improve handling of #c_seq{}, making sure to simplify a #c_seq{}
as much as possible. With that improvement, we can remove some
special-case code from opt_simple_let_2/6.
|
|
|
|
|
|
The annotations in the optimizing passes currently looks like this:
{'%live',NumRegistersUsed,RegistersUsedBitmap}
{'%def',RegistersDefinedBitmap}
(NumRegistersUsed is no longer used.)
When I attempted to extend some optimizations, I found that I had to
add additional clauses to tolerate/handle both types of
annotations. That problem would only get worse if any more annotations
are added in the future.
To simplify annotation handling, this commit wraps both types of
annotations in a {'%anno',_} tuple:
{'%anno',{used,RegistersUsedBitmap}}
{'%anno',{def,RegistersDefinedBitmap}}
The '%live' annotation has been renamed to 'used' to make it somewhat
clearer what it means, and the unused NumRegistersUsed part of the
old annotation has been removed.
Alternatives considered: My first attempt was to wrap the annotation
in a 'set' tuple so that there would only be 'set' tuples in a block.
For example:
{set,[],[],{anno,{live,RegistersUsedBitmap}}}
It was not as convenient as expected. Annotations often need to be
handled specially from other instructions in a block. When they are
wrapped in a 'set' tuple, they can very easily be handled incorrectly
or passed on to the next pass. That causes subtle errors or worse
code, and it can be difficult to debug.
Therefore, my conclusion is that annotations should be distinct from
other instructions, to make it obvious when one have missed to handle
an annotation.
|
|
|
|
* ingela/ssl/timeout-cuddle:
ssl: Tune timeouts
|
|
|
|
The previous implementation generated a term, converted it to plain
text with io_lib:format/2, and then converted that to a binary
before writing it to disk.
We now emit the term as we go, which should make it a bit safer to
extract this information under load.
|
|
|
|
jhogberg/john/compiler/reintroduce-tuple-arity-optimizations/OTP-14857
Reintroduce the tuple arity optimizations removed in PR #1673
|
|
Optimize allocation of stack frames
|
|
When a process has many links and/or monitors, it could earlier take
very long time to display the process information window. This is now
improved by only showing a few links and monitors, and then an link
named "more..." to expand the rest.
Reading of the "Link list" from a crashdump is also improved.
|
|
Turn more allocate_zero instructions into allocate instructions.
|
|
An 'allocate' or 'allocate_zero' instruction should not be
shortly followed by a 'test_heap' instruction. For example,
we don't want this type of code:
{allocate_zero,3,4}.
{line,...}.
{test_heap,7,4}.
{bif,element,{f,0},...,...}.
While the code is safe because 'allocate_zero' has initialized the
stack frame, it is wasteful. Also note that the code would become
unsafe if the 'allocate_zero' instruction were to be replaced with
an 'allocate' instruction.
What we want to see is this:
{allocate_heap_zero,3,7,4}.
{line,...}.
{bif,element,{f,0},...,...}.
|
|
In 21dd6e55877832, beam_utils:combine_heap_needs/2 stopped
wrapping an allocation list in an {alloc,...} tuple. That was
not noticed because the faulty heap need created in beam_block
was discarded by beam_type.
|
|
beam_utils:is_killed/3 could incorrectly indicate that a
register was killed, when in fact it was referenced by
an instruction that did a GC.
|