Age | Commit message (Collapse) | Author |
|
I did not find any legitimate use of "can not", however skipped
changing e.g RFCs archived in the source tree.
|
|
For a long time, there has been the two macros IS_SSMALL() and
MY_IS_SSMALL() that do exactly the same thing.
There should only be one, and it should be called IS_SSMALL().
However, we must decide which implementation to use. When
MY_IS_SSMALL() was introduced a long time ago, it was the most
efficient. In a modern C compiler, there might not be any
difference.
To find out, I used the following small C program to examine
the code generation:
#include <stdio.h>
typedef unsigned int Uint32;
typedef unsigned long Uint64;
typedef long Sint;
#define SWORD_CONSTANT(Const) Const##L
#define SMALL_BITS (64-4)
#define MAX_SMALL ((SWORD_CONSTANT(1) << (SMALL_BITS-1))-1)
#define MIN_SMALL (-(SWORD_CONSTANT(1) << (SMALL_BITS-1)))
#define MY_IS_SSMALL32(x) (((Uint32) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL64(x) (((Uint64) ((((x)) >> (SMALL_BITS-1)) + 1)) < 2)
#define MY_IS_SSMALL(x) (sizeof(x) == sizeof(Uint32) ? MY_IS_SSMALL32(x) : MY_IS_SSMALL64(x))
#define IS_SSMALL(x) (((x) >= MIN_SMALL) && ((x) <= MAX_SMALL))
void original(Sint n)
{
if (IS_SSMALL(n)) {
printf("yes\n");
}
}
void enhanced(Sint n)
{
if (MY_IS_SSMALL(n)) {
printf("yes\n");
}
}
gcc 7.2 produced the following code for the original() function:
.LC0:
.string "yes"
original(long):
movabs rax, 576460752303423488
add rdi, rax
movabs rax, 1152921504606846975
cmp rdi, rax
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang 5.0.0 produced the following code which is slightly better:
original(long):
movabs rax, 576460752303423488
add rax, rdi
shr rax, 60
jne .LBB0_1
mov edi, .Lstr
jmp puts # TAILCALL
.LBB0_1:
ret
.Lstr:
.asciz "yes"
However, in the context of beam_emu.c, clang could produce
similar to what gcc produced.
gcc 7.2 produced the following code when MY_IS_SSMALL() was used:
.LC0:
.string "yes"
enhanced(long):
sar rdi, 59
add rdi, 1
cmp rdi, 1
jbe .L4
rep ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
clang produced similar code.
This code seems to be the cheapest. There are four instructions, and
there is no loading of huge integer constants.
|
|
|
|
The functions have been found using: https://github.com/caolanm/callcatcher
|
|
|
|
|
|
|
|
Just mask away the high bits to get a more tolerant erlang:halt
that behaves the same on 32 and 64 bit architectures.
|
|
Now tries to use whole width of signed long (Sint) and this halves amount of
multiplications needed to parse long integers. New code is 2-3 times faster
than the old code for large inputs (tens and hundreds of digits), behavior
should not change for small inputs.
Test ran 10k times with GC forced between attempts.
Was (R17):
720 el base 10: 0.14682 sec; base 16: 0.192722 sec; base 36: 0.337118 sec.
2800 el base 10: 1.794133 sec; base 16: 2.735106 sec; base 36: 4.761108 sec.
6500 el base 10: 9.316434 sec; base 16: 14.109469 sec; base 36: 25.319263 sec.
Now (R19 Dev)
720 el base 10: 0.10265 sec; base 16: 0.10851 sec; base 36: 0.160478 sec.
2800 el base 10: 1.002793 sec; base 16: 1.360649 sec; base 36: 2.174309 sec.
6500 el base 10: 4.722197 sec; base 16: 6.60522 sec; base 36: 10.552795 sec.
Added test for corner cases and sign bit corruption. Replaced macros with
inline and hid it inside C file to not pollute global namespace
Old bug in #define LG2_LOOKUP: Replaced with inline function and table
recalculated for all bases 2 to 36 (was 2 to 64)
|
|
|
|
|
|
|
|
by adding a dynamic heap factory.
"binary_to_term" is now a hybrid solution with both
a call to decoded_size() to calculate needed heap space
AND possible dynamic allocation of more heap space
if needed for big maps.
The heap size returned from decoded_size() is guaranteed
to be sufficient for all term heap data except for hashmap
nodes. All hashmap nodes are created at the end of dec_term()
by invoking the heap factory interface that may allocate more
heap space on process heap or in fragments.
With this commit it is no longer guaranteed that a message
is confined to only one heap fragment.
|
|
The old time API is based on erlang:now/0. The major issue with
erlang:now/0 is that it was intended to be used for so many
unrelated things. This tied these unrelated operations together
and unnecessarily caused performance, scalability as well as
accuracy, and precision issues for operations that do not need
to have such issues. The new API spreads different functionality
over multiple functions in order to improve on this.
The new API consists of a number of new BIFs:
- erlang:convert_time_unit/3
- erlang:monotonic_time/0
- erlang:monotonic_time/1
- erlang:system_time/0
- erlang:system_time/1
- erlang:time_offset/0
- erlang:time_offset/1
- erlang:timestamp/0
- erlang:unique_integer/0
- erlang:unique_integer/1
- os:system_time/0
- os:system_time/1
and a number of extensions of existing BIFs:
- erlang:monitor(time_offset, clock_service)
- erlang:system_flag(time_offset, finalize)
- erlang:system_info(os_monotonic_time_source)
- erlang:system_info(time_offset)
- erlang:system_info(time_warp_mode)
- erlang:system_info(time_correction)
- erlang:system_info(start_time)
See the "Time and Time Correction in Erlang" chapter of the
ERTS User's Guide for more information.
|
|
When INT64_MIN is the value of a Sint64 we have to first cast it to
an Uint64 before negating it. Otherwise we get an integer overflow
which is undefined behaviour and in gcc 4.9 this results in -0 instead
of -9223372036854775808 in gcc 4.8.
|
|
big_buf was one word too short on 32-bit emulators causing
memory corruption.
Seems like this did not cause a problem before the ESTACK memory layout
was changed in 172ebf11dc455e22b87f.
|
|
|
|
Added: binary_to_integer/1,2, integer_to_binary/1,2
|
|
Can still not setup -a, but cerl works.
|
|
For floating point values which are greater than 9007199254740990.0 or
smaller than -9007199254740990.0, the floating point numbers are now
converted to integers during comparison with an integer. This makes
number comparisons transitive for large floating point numbers.
|
|
Conflicts:
erts/emulator/beam/erl_printf_term.c
|
|
|
|
In halfword emulator, make ETS use a variant of the internal term
format that uses relative offsets instead of absolute pointers. This
will allow storage in high memory (>4G). Preprocessor macros (like
list_val_rel(TERM,BASE)) are used to make normal (fullword) emulator
almost completely unchanged while still reusing most of the code.
|
|
|
|
* pan/otp_8332_halfword:
Teach testcase in driver_suite the new prototype for driver_async
wx: Correct usage of driver callbacks from wx thread
Adopt the new (R13B04) Nif functionality to the halfword codebase
Support monitoring and demonitoring from driver threads
Fix further test-suite problems
Correct the VM to work for more test suites
Teach {wordsize,internal|external} to system_info/1
Make tracing and distribution work
Turn on instruction packing in the loader and virtual machine
Add the BeamInstr data type for loaded BEAM code
Fix the BEAM dissambler for the half-word emulator
Store pointers to heap data in 32-bit words
Add a custom mmap wrapper to force heaps into the lower address range
Fit all heap data into the 32-bit address range
|
|
The following test suites now work:
send_term_SUITE
trace_nif_SUITE
binary_SUITE
match_spec_SUITE
node_container_SUITE
beam_literals_SUITE
Also add a testcases for system_info({wordsize,internal|external}).
|
|
|
|
Store Erlang terms in 32-bit entities on the heap, expanding the
pointers to 64-bit when needed. This works because all terms are stored
on addresses in the 32-bit address range (the 32 most significant bits
of pointers to term data are always 0).
Introduce a new datatype called UWord (along with its companion SWord),
which is an integer having the exact same size as the machine word
(a void *), but might be larger than Eterm/Uint.
Store code as machine words, as the instructions are pointers to
executable code which might reside outside the 32-bit address range.
Continuation pointers are stored on the 32-bit stack and hence must
point to addresses in the low range, which means that loaded beam code
much be placed in the low 32-bit address range (but, as said earlier,
the instructions themselves are full words).
No Erlang term data can be stored on C stacks (enforced by an
earlier commit).
This version gives a prompt, but test cases still fail (and dump core).
The loader (and emulator loop) has instruction packing disabled.
The main issues has been in rewriting loader and actual virtual
machine. Subsystems (like distribution) does not work yet.
|
|
|