Age | Commit message (Collapse) | Author |
|
|
|
|
|
In the future we might want to add more bit syntax optimizations,
but beam_block is already sufficiently complicated. Therefore, move
the bit syntax optimizations out of beam_block into a separate
compiler pass called beam_bs.
|
|
When matching tuples, the pattern matching compiler would generate
code that would fetch all elements of the tuple that will ultimately
be used, *before* testing that (for example) the first element is the
correct record tag. For example:
is_tuple Fail {x,0}
test_arity Fail {x,0} 3
get_tuple_element {x,0} 0 {x,1}
get_tuple_element {x,0} 1 {x,2}
get_tuple_element {x,0} 2 {x,3}
is_eq_exact Fail {x,1} some_tag
If {x,2} and {x,3} are not used at label Fail, we can re-arrange the
code like this:
is_tuple Fail {x,0}
test_arity Fail {x,0} 3
get_tuple_element {x,0} 0 {x,1}
is_eq_exact Fail {x,1} some_tag
get_tuple_element {x,0} 1 {x,2}
get_tuple_element {x,0} 2 {x,3}
Doing that may be beneficial in two ways.
If the branch is taken, we have eliminated the execution of two
unnecessary instructions.
Even if the branch is never or rarely taken, there is the possibility
for more optimizations following the is_eq_exact instructions.
For example, imagine that the code looks like this:
get_tuple_element {x,0} 1 {x,2}
get_tuple_element {x,0} 2 {x,3}
move {x,2} {y,0}
move {x,3} {y,1}
Assuming that {x,2} and {x,3} have no further uses in the code
that follows, that can be rewritten to:
get_tuple_element {x,0} 1 {y,0}
get_tuple_element {x,0} 2 {y,1}
When should we perform this optimization?
At the very latest, it must be done before opt_blocks/1 in
beam_block which does the elimination of unnecessary moves.
Actually, we want do the optimization before the blocks have
been established, since moving instructions out of one block
into another is cumbersome.
Therefore, we will do the optimization in a new pass that is
run before beam_block. A new pass will make debugging easier,
and beam_block already has a fair number of sub passes.
|
|
|
|
The misc_SUITE:integer_encoding/1 test case is annoyingly slow.
Rewrite the encoding of integers in beam_asm to use the
binary:encode_unsigned/1 BIF.
Also tweak the test case itself. Scale the down the maximum
size of the numbers being generated, but also add test of
numbers around boundaries of power of two (which are the numbers
most likely to expose bugs in the encoding).
|
|
The beam_validator catches all exceptions and collect them.
It makes more sense to don't catch 'error' and 'exit' exceptions,
but to just print out the name of the current function and pass
on the exception just as all other compilation passes do. Those
kind of exceptions are the symptoms of the kind of severe but
easily catched bugs that occur during development.
|
|
I have spent too much time lately waiting for 'cover' to finish,
so now its time to optimize the running time of the tests suite
in coverage mode.
Basically, when 'cover' is running, the test suites would not
run any tests in parallel. The reason is that using too many
parallel processes when running 'cover' would be slower than
running them sequentially. But those measurements were made
several years ago, and many improvements have been made to
improve the parallelism of the run-time system.
Experimenting with the test_lib:p_run/2 function, I found that
increasing the number of parallel processes would speed up the
self_compile tests cases in compilation_SUITE. The difference
between using 3 processes or 4 processes was slight, though,
so it seems that we should not use more than 4 processes when
running 'cover'.
We don't want to change test_lib:parallel/0, because there is
no way to limit the number of test cases that will be run in
parallel by common_test. However, there as test suites (such as
andor_SUITE) that don't invoke the compiler at run-time. We can
run the cases in such test suites in parallel even if 'cover'
is running.
|
|
Amend the test suite to call beam_dead as originally intended (and not
beam_block), and modify the input data so that the exception will
occur within the try ... catch block in function/2.
|
|
Run testcases in parallel will make the test suite run slightly
faster. Another reason for this change is that we want more testing
of parallel testcase support in common_test.
|
|
Introduce the mandary beam_a pass that will be run directly after code
generation, and the mandatory beam_z pass that will be run just before
beam_asm. Since these passes surround the optimizations, beam_a can
(for example) do instruction renaming to simplify the optimization
passes and beam_z can undo those renamings.
|
|
|
|
|
|
In order to save space, rewrite suitable calls to erlang:error/{1,2}
to special BEAM instructions.
This code is probably longer than the code taken out of v3_life and
v3_codegen in the previous commit, but it is much easier to
understand and maintain since the BEAM assembler format is better
understood than the v3_life format.
|
|
|
|
In 3d0f4a3085f11389e5b22d10f96f0cbf08c9337f (an update to conform
with common_test), in all test_lib:recompile(?MODULE) calls, ?MODULE
was changed to the actual name of the module. That would cause
test_lib:recompile/1 to compile the module with the incorrect
compiler options in cloned modules such as record_no_opt_SUITE,
causing worse coverage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-opaque declarations should not be retained in the attributes
(because they will be loaded along with the code and are not
useful).
While at it, filter away those Dialyzer attributes as early
as possible - in v3_kernel.
|
|
* bg/opt-receive:
Test that gen_server:call/2,3 are fast even with a huge message queue
erts: Add tests for the receive optimization
Update primary bootstrap
erts: Implement recv_mark/1 and recv_set/1 for real
compiler tests: Cover the error handling code in beam_receive
compiler test: Test optimization of receive statements
Optimize selective receives in the presence of a large message queue
Introduce the new recv_mark/1 and recv_mark/1 instructions
Compile tests that communicate with R12 nodes with the r12 option
Move p_run/2 to test_lib
gen: Inline wait_resp_mon/2 to help the compiler optimize
OTP-8623 bg/opt-receive
reveive statements that can only read out a newly created reference are now
specially optimized so that it will execute in constant time regardless of
the number of messages in the receive queue for the process. That
optimization will benefit calls to gen_server:call(). (See gen:do_call/4
for an example of a receive statement that will be optimized.)
|
|
|
|
|