Age | Commit message (Collapse) | Author |
|
The halfword emulator used to require special handling,
but no longer does.
|
|
It is not necessary to decrement I, because an exception is
about to be generated. Furthermore, I pointing *before* the
instruction that caused the exception may cause problems in
the future.
|
|
Frequency counts show that
move Const x(1)
move Const x(2)
are very common.
|
|
That will save one word and small amount of time for
each occurrence.
|
|
The is_eq_exact/3 and is_ne_exact/3 instructions are commonly used
with one immediate or literal operand.
Introduce three new specialized instructions:
i_is_eq_exact_literal/3
i_is_ne_exact_immed/3
i_is_ne_exact_literal/3
The i_is_ne_exact_literal/3 instruction is not very frequently
used, but its existence is justified because we removed in a
a previous commit the special instruction for matching bignums
and we now use i_is_ne_exact_literal/3 instead.
For consistency, rename the existing is_eq_immed/3 instruction to
is_eq_exact_immed/3.
While at it, remove the optimization of an is_eq/3 instruction
with an immediate operand because that optimization is already
done by the compiler.
|
|
Introduce a new i_increment/4 to optimize the addition of
a register and a small integer. This instruction saves two
instruction words compared to the standard instructions
(an i_fetch/2 instruction followed by a i_plus/3 instruction)
and will also be slightly faster.
|
|
The new instruction will save one word (because no size operand
is needed), and is slightly faster.
Handle select_tuple_arity in the same way.
|
|
Create separate instructions for each register type. The "badmatch x(0)"
and "case_end x(0)" (which are very common) will only require a single
word each, compared to two words when GetArg1() is used.
|
|
Use separate instructions for each register type.
|
|
Use separate instructions for each register type.
|
|
|
|
Instead of having one i_select_val_sfI instruction that uses
the GetArg1() macro to fetch the controlling expression, use
three separate instructions for each of the register types.
That will save one word when selecting on the {x,0} register.
It should also be slightly faster since a conditional branch
is eliminated.
Although it seems that the BEAM compiler will never generate
a constant controlling expression (even with optimizations
turned off), we still make sure that they will work by
evaluating the select_val instruction at load time.
Handle the select_tuple_arity instruction in the same way.
|
|
The tmp_arg1 and tmp_arg2 variables are intended for transferring
values from the fetch/2 instructions to instructions such as
i_plus/3. In many places, however, tmp_arg1 and tmp_arg2 are used
as general temporary variables within a single instruction.
Improve the code generation by replacing sloppy use of tmp_arg1
and tmp_arg2 with block-local variables. In most cases, that will
allow the temporary values to be kept in registers.
|
|
By default, GCC will inline calls to helper functions. Since
process_main() is already huge, there is no reason to inline
the helper functions (and some of them are used very seldom).
|
|
There were two separate functions (call_error_handler() and
call_breakpoint_handler()) that were identical except for
the name of the function in the error_handler module being
called. Generalize call_error_handler() by adding a function
name argument so that it can be used for both purposes.
Also let the call_error_handler() return the new program
counter instead of passing it in c_p->i. That slightly decrease
the code size at the call site.
There is also no need to use the Dispatch() macro to yet again
decrease the reduction counter, because that has just been done by
the call instruction that caused the execution of the
call_error_handler or i_debug_breakpoint instruction.
|
|
Combine the put_tuple/2 and all following put/1 instructions
to one i_put_tuple/2 instruction. In general, that will reduce
the number of instruction words by 50 percent.
Measurements seem to indicate that the speed is about the same.
|
|
|
|
In many (not all) cases, the value for the 'I' type will
fit into 32 bits.
|
|
|
|
Introduce a new 'Q' type, similar to 'P' except that it
can be packed.
|
|
There was a version of the BEAM loader and emulator that
had two versions of the fmove/2 instruction, one version
that allocated heap space internally and a newer version that
assumed that a previous test_heap/2 instruction had already
allocated the heap space.
Though the allocating fmove/2 instruction is no longer
supported, some vestiges of it still remains.
|
|
* pan/ets_binary_overhead/OTP-8762:
Remove binary overhead counter from ets objects
|
|
As the overhead counter got larger and never really was needed in ets objects,
I removed them.
A few stray comments of XXX:PaN type from halfword dev removed in the process.
|
|
This will reduce the risk of integer wrapping in bin vheap counting.
The vheap size series will now use the golden ratio instead of doubling
and fibonacci sequences.
OTP #8730
|
|
|
|
Merging the three off-heap lists (binaries, funs and externals) into
one list. This reduces memory consumption by two words (pointers) per
ETS object.
|
|
|
|
Call count previously used a global lock for accessing and writing
its counter in the breakpoint. This is now changed to atomics instead.
The change will let call count tracing and cprof to scale better
when increasing the number of schedulers.
|
|
To solve the issue of multiple schedulers constantly updating the
head pointer to the bp data wheel, each scheduler now has its own
entrypoint to the wheel. This head pointer can be updated without
a locking being taken. Previously there were no lock ...
|
|
|
|
call_time trace will use instruction pointers instead of
breakpoint data pointers. More costly lookup but the bdt
structure might be deallocated, we do not want that.
Remove unnecessary pattern lock.
|
|
|
|
Initial commit with a new breakpoint instruction and PSD areas
for temporary time storage during tracing.
|
|
The last compiler to generate code that uses the bs_bits_to_bytes/3
instruction was the R11 compiler. Since we don't support loading
R11 *.beam files in R14, removing the remaining support for the
instruction.
|
|
Since R12B, empty tuples are literals. Thus the compiler will no
longer generate the instruction:
put_tuple 0 Destination
for creating an empty tuple. It is now time to stop supporting
that instruction in the run-time system.
While we are at it, correct a typo.
|
|
Add the gc_bif's to the VM.
Add infrastructure for gc_bif's (guard bifs that can gc) with two and.
three arguments in VM (loader and VM).
Add compiler support for gc_bif with three arguments.
Add compiler (and interpreter) support for new guard BIFs.
Add testcases for new guard BIFs in compiler and emulator.
|
|
The recv_mark/1 instruction will both save the current
position in the message queue and a mark (the address of the
loop_rec/2 instruction just following the recv_set/1
instruction). The recv_mark/1 instruction will only
use the saved position if the mark is correct.
The reason for saving and verifying the mark is that
the compiler does not need to guarantee that no other
receive instruction can be executed in between the
recv_mark/1 and recv_set/1 instructions (the mark will
be cleared by the remove_message/0 instruction when a message
is removed from the message queue). That means that arbitrary
function calls in between those instruction can be allowed.
|
|
|
|
New NIF API function enif_make_new_binary
|
|
Since R14 does not need to load code that can also be loaded
in an R11 run-time system, support for the put_string/3
instruction can be removed.
|
|
Fix safe_mul in the loader, which caused failures in the bit
syntax test cases.
Fix yet another Uint in erl_alloc.h (ERTS_CACHE_LINE_SIZE) causing
segmentation fault when we have many schedulers (why only in that
situation?).
Clean up erl_mseg (remove old code for the Linux 32-bit mmap flag).
While at it, also remove compilation warnings.
|
|
The following test suites now work:
send_term_SUITE
trace_nif_SUITE
binary_SUITE
match_spec_SUITE
node_container_SUITE
beam_literals_SUITE
Also add a testcases for system_info({wordsize,internal|external}).
|
|
Rewrite trace code and external coding. Also slightly correct
the interface to the match-spec engine to make tracing work.
That will make the test suites runnable.
|
|
For cleanliness, use BeamInstr instead of the UWord
data type to any machine-sized words that are used
for BEAM instructions. Only use UWord for untyped
words in general.
|
|
Store Erlang terms in 32-bit entities on the heap, expanding the
pointers to 64-bit when needed. This works because all terms are stored
on addresses in the 32-bit address range (the 32 most significant bits
of pointers to term data are always 0).
Introduce a new datatype called UWord (along with its companion SWord),
which is an integer having the exact same size as the machine word
(a void *), but might be larger than Eterm/Uint.
Store code as machine words, as the instructions are pointers to
executable code which might reside outside the 32-bit address range.
Continuation pointers are stored on the 32-bit stack and hence must
point to addresses in the low range, which means that loaded beam code
much be placed in the low 32-bit address range (but, as said earlier,
the instructions themselves are full words).
No Erlang term data can be stored on C stacks (enforced by an
earlier commit).
This version gives a prompt, but test cases still fail (and dump core).
The loader (and emulator loop) has instruction packing disabled.
The main issues has been in rewriting loader and actual virtual
machine. Subsystems (like distribution) does not work yet.
|
|
This is the first step in the implementation of the half-word emulator,
a 64-bit emulator where all pointers to heap data will be stored
in 32-bit words. Code specific for this emulator variant is
conditionally compiled when the HALFWORD_HEAP define has
a non-zero value.
First force all pointers to heap data to fall into a single 32-bit range,
but still store them in 64-bit words.
Temporary term data stored on C stack is moved into scheduler specific
storage (allocated as heaps) and macros are added to make this
happen only in emulators where this is needed. For a vanilla VM the
temporary terms are still stored on the C stack.
|
|
|
|
fragments was created. This will mainly benefit NIFs that return
large compound terms.
|
|
NIF function prototypes in order to allow more than 3 function
arguments. Also an incompatible change in the return value of
erlang:load_nif/2. Added support for references, floats and term
comparison in NIFs. Read more in the documentation of erl_nif and
erlang:load_nif/2.
|
|
|