aboutsummaryrefslogtreecommitdiffstats
path: root/erts/emulator/beam/beam_load.c
AgeCommit message (Collapse)Author
2016-04-07Don't let the loader do the compiler's jobBjörn Gustavsson
Optimizations that are possible to do by the compiler should be done by the compiler and not by the loader. If the compiler has done its job correctly, attempting to do the two transformations only wastes time.
2016-04-07Avoid rebuilding unchanged instructionsBjörn Gustavsson
In transformations such as: move S X0=x==0 | line Loc | call_ext Ar Func => \ line Loc | move S X0 | call_ext Ar Func we can avoid rebuilding the last instruction in the sequence by introducing a 'keep' instruction. Currently, there are only 13 transformations that are hit by this optimization, but most of them are frequently used.
2016-04-07Introduce a 'rename' instructionBjörn Gustavsson
Introduce a 'rename' instruction that can be used to optimize simple renaming with unchanged operands such as: get_tuple_element Reg P Dst => i_get_tuple_element Reg P Dst By allowing it to lower the arity of instruction, transformations such as the following can be handled: trim N Remaining => i_trim N All in all, currently 67 transformations can be optimized in this way, including some commonly used ones.
2016-04-07Simplify window management for the transformation engineBjörn Gustavsson
Generic instructions have a min_window field. Its purpose is to avoid calling transform_engine() when there are too few instructions in the current "transformation window" for a transformation to succeed. Currently it does not do much good since the window size will be decremented by one before being used. The reason for the subtraction is probably that in some circumstances in the past, the loader could read past the end of the BEAM module while attempting to fetch instructions to increase the window size. Therefore, it would not be safe to just remove the subtraction by one. The simplest and safest solution seems to always ensure that there are always at least TWO instructions when calling transform_engine(). That will be safe, as long as a BEAM module is always finished with an int_code_end/0 that is not involved in any transformation.
2016-04-07Eliminate allocation of variables in transform_engine()Björn Gustavsson
When an instruction with a variable number operands (such as select_val) is seen of the left side of a transformation, the 'next_arg' instruction will allocate a buffer to fit all variables and all operands will be copied into the buffer. Very often, the 'commit' instruction will never be reached because of a test or predicate failing or because of a short window; in that case, the variable buffer will be deallocated. Note that originally there were only few instructions with a variable number of operands, but now common operations such as tuple building also have a variable number of operands. To avoid those frequent allocations and deallocations, modify the 'next_arg' instruction to only save a pointer to the first of the "rest" arguments. Also move the deallocation of the instructions on the left side from the 'commit' instruction to the 'end' instruction to ensure that 'store_rest_args' will still work.
2016-04-06Refactor calls to transform_engine()Björn Gustavsson
We used to set last_op_next and last_op to NULL just in case. Setting last_op_next to causes a rescan of the instructions to find the last instruction in the chain, so we would want to avoid that unless really necessary.
2016-03-31Fix unsafe transformation of apply/3 with fixed argumentsBjörn Gustavsson
62473daf introduced an unsafe optimization in the loader. See the comments in the test case for an explanation of the problem.
2016-03-29erts: Improve enif_binary_to_termSverker Eriksson
* Accept a raw data buffer instead of ErlNifBinary * Accept option ERL_NIF_BIN2TERM_SAFE * Return number of read bytes
2016-02-20beam_load.c: Add a function to check for an on_load functionBjörn Gustavsson
We will need a way to check whether an prepared BEAM modules has an on_load function.
2016-02-18Merge branch 'sverk/fix-list-length-int/OTP-13288'Sverker Eriksson
* sverk/fix-list-length-int/OTP-13288: erts: Fix error cases in enif_get_list_length erts: Use Sint instead of int for list lengths
2016-02-08erts: Use Sint instead of int for list lengthsRichard Carlsson
This avoids potential integer arithmetic overflow for very large lists.
2016-01-28Merge branch 'master' into sverk/hipe-line-table-bug/master/OTP-13282Sverker Eriksson
2016-01-28erts: Fix bug concerning line information for hipe modulesSverker Eriksson
Line table was left uninitialized for hipe (stub) modules causing process_info(OtherPid, current_location) to crash.
2015-12-09Merge branch 'egil/pd-opt-get/OTP-13167'Björn-Egil Dahlberg
* egil/pd-opt-get/OTP-13167: erts: Add i_get_hash instruction erts: Use internal hash for process dictionaries
2015-12-07erts: Add i_get_hash instructionBjörn-Egil Dahlberg
Calculate hashvalue in load-time for constant process dictionary gets.
2015-11-12Fragmented young heap generation and off_heap_message_queue optionRickard Green
* The youngest generation of the heap can now consist of multiple blocks. Heap fragments and message fragments are added to the youngest generation when needed without triggering a GC. After a GC the youngest generation is contained in one single block. * The off_heap_message_queue process flag has been added. When enabled all message data in the queue is kept off heap. When a message is selected from the queue, the message fragment (or heap fragment) containing the actual message is attached to the youngest generation. Messages stored off heap is not part of GC.
2015-11-12Introduce literal tagRickard Green
2015-11-12erts: Refactor line table in loaded beam codeSverker Eriksson
to use real C struct with correct types
2015-11-12erts: Refactor header of loaded beam codeSverker Eriksson
to use a real C struct instead of array.
2015-11-12erts: Add support for fast erts_is_literal()Sverker Eriksson
2015-09-11erts: Add new allocator LITERALSverker Eriksson
2015-07-03Use a cheaper tag scheme for 'd' operandsBjörn Gustavsson
Since 'd' operands can only either an X register or an Y register, we only need a single bit to distinguish them. Furthermore, we can pre-multiply the register number with the word size to speed up address calculation.
2015-07-03Introduce swap_temp/3 and swap/2Björn Gustavsson
Sequences of three move instructionst that effectively swap the contents of two registers are fairly common. We can replace them with a swap_temp/3 instruction. The third operand is the temporary register to be used for swapping, since the temporary register may actually be used. If swap_temp/3 instruction is followed by a call, the temporary register will often (but not always) be killed by the call. If it is killed, we can replace the swap_temp/3 instruction with a slightly cheaper swap/2 instruction.
2015-07-03Eliminate use of i_fetch for bit syntax instructionsBjörn Gustavsson
2015-07-03Eliminate the use of i_fetch for BIF instructionsBjörn Gustavsson
2015-07-03Make the 'r' operand type optionalBjörn Gustavsson
The 'r' type is now mandatory. That means in order to handle both of the following instructions: move x(0) y(7) move x(1) y(7) we would need to define two specific operations in ops.tab: move r y move x y We want to make 'r' operands optional. That is, if we have only this specific instruction: move x y it will match both of the following instructions: move x(0) y(7) move x(1) y(7) Make 'r' optional allows us to save code space when we don't want to make handling of x(0) a special case, but we can still use 'r' to optimize commonly used instructions.
2015-07-03Allow X and Y registers to be overloaded with any literalBjörn Gustavsson
Consider the try_case_end instruction: try_case_end s The 's' operand type means that the operand can either be a literal of one of the types atom, integer, or empty list, or a register. That worked well before R12. In R12 additional types of literals where introduced. Because of way the overloading was done, an 's' operand cannot handle the new types of literals. Therefore, code such as the following is necessary in ops.tab to avoid giving an 's' operand a literal: try_case_end Literal=q => move Literal x | try_case_end x While this work, it is error-prone in that it is easy to forget to add that kind of rule. It would also be complicated in case we wanted to introduce a new kind of addition operator such as: i_plus jssd Since there are two 's' operands, two scratch registers and two 'move' instructions would be needed. Therefore, we'll need to find a smarter way to find tag register operands. We will overload the pid and port tags for X and Y register, respectively. That works because pids and port are immediate values (fit in one word), and there are no literals for pids and ports.
2015-07-03Eliminate R_REG_DEFBjörn Gustavsson
2015-07-03Change the meaning of 'x' in a transformationBjörn Gustavsson
The purpose of this series of commits is to improve code generation for the Clang compiler. As a first step we want to change the meaning of 'x' in a transformation such as: operation Literal=q => move Literal x | operation x Currently, a plain 'x' means reg[0] or x(0), which is the first element in the X register array. That element is distinct from r(0) which is a variable in process_main(). Therefore, since r(0) and x(0) are currently distinct it is fine to use x(0) as a scratch register. However, in the next commit we will eliminate the separate variable for storing the contents of X register zero (thus, x(0) and r(0) will point to the same location in the X register array). Therefore, we must use another scratch register in transformation. Redefine a plain 'x' in a transformation to mean x(1023). Also define SCRATCH_X_REG so that we can refer to the register by name from C code.
2015-06-24erts: Remove HALFWORD_HEAP definitionBjörn-Egil Dahlberg
2015-06-18Change license text to APLv2Bruce Yinhe
2015-06-15Merge branch 'hamt_bin2term'Sverker Eriksson
* hamt_bin2term: erts: Add erts_factory_trim_and_close erts: Optimize driver_deliver_term erts: Remove hashmap probabilistic heap overestimation Conflicts: erts/emulator/beam/beam_load.c
2015-06-15erts: Remove hashmap probabilistic heap overestimationSverker Eriksson
by adding a dynamic heap factory. "binary_to_term" is now a hybrid solution with both a call to decoded_size() to calculate needed heap space AND possible dynamic allocation of more heap space if needed for big maps. The heap size returned from decoded_size() is guaranteed to be sufficient for all term heap data except for hashmap nodes. All hashmap nodes are created at the end of dec_term() by invoking the heap factory interface that may allocate more heap space on process heap or in fragments. With this commit it is no longer guaranteed that a message is confined to only one heap fragment.
2015-06-10Fix segfault in module_info for deleted modulesRichard Carlsson
Add a check to protect from segfault when erlang:get_module_info/1/2 is called on a deleted module (i.e. with no current code). Also refactor erts_module_info_0/1 to avoid repeated calls to erts_active_code_ix() and remove some obsolete comments. Add test for module_info on deleted modules.
2015-05-07Set module_info md5 for native modules properlyRichard Carlsson
Use the md5 of the native code chunk instead of the Beam code md5.
2015-05-07Add module_info entry for native codeRichard Carlsson
2015-05-07Gracefully handle empty md5 field in module_infoRichard Carlsson
2015-04-23erts: Fix loader increment from minus instructionBjörn-Egil Dahlberg
A type error caused the optimization to never kick in.
2015-04-13Pre-compute hash values for the general get_map_elements instructionBjörn Gustavsson
See the previous commit for justification and use cases.
2015-04-13Teach the loader to pre-compute the hash value for single-key lookupsBjörn Gustavsson
Let the loader pre-compute the hash value when a single, literal key is matched as in: #{<<"some_key">>:=V} = Map In my measurements, this optimization resulted in a 30 percent speedup for short binary keys. Unfortunately, this optimizization makes no difference for small maps with less than 32 keys, since the hash value is not used. Still, there are the following use cases: * A map used instead of a record with more than 32 entries. I have seen some applications with huge records. * Lookup in JSON dictionaries represented as maps. The hash value will only be used when the map is a hash map (currently, that means at least 32 entries).
2015-04-13Sort maps keys in the loaderBjörn Gustavsson
The map instructions require that the keys in the instructions are sorted (for flatmaps). But that is an implementation detail that should not exposed outside of the BEAM virtual machine. Therefore, make the sorting of the keys the responsibility of the loader and not the compiler. Also note that the sort order for maps with numeric keys or keys with numeric components has changed in OTP 18. That means that code compiled for OTP 17 that operated on maps with map keys might not work in OTP 18 without the sorting in the loader (although it is unlikely to be an issue in practice).
2015-04-13De-optimize the has_map_fields instructionsBjörn Gustavsson
The has_map_fields instruction is infrequently used. Thus there is no need to have the fastest possible implementation; it is better to have an implementation that reduces the code size in the already big process_main() function. We can transform has_map_fields to a get_map_elements instruction, targeting the same unused x[0] register for all keys. That instruction will only be marginally slower than existing implementation.
2015-04-13Fully evaluate is_map/1 for literals at load-timeBjörn Gustavsson
The compiler will only emit is_map/1 instructions with literal argument if optimization is turned off. Therefore, the only reason for this commit is cleanliness.
2015-04-13Correct transformation of put_map_assoc to new_mapBjörn Gustavsson
A put_map_assoc instruction with an empty source map should be converted to a simpler new_map instruction. The transformation didn't happen because an empty source map is no longer represented as a NIL term (as it was in the beginning before map literals were implemented).
2015-03-12erts: Fix beam_load assertBjörn-Egil Dahlberg
2015-03-12Merge branch 'maint'Henrik Nord
Conflicts: erts/emulator/hipe/hipe_bif0.c
2015-02-04don't create oversize bignums in binary matchingMikael Pettersson
Bignums are artifically restricted in size. Arithmetic and logical operations check the sizes of resulting bignums, and turn oversize results into system_limit exceptions. However, this check is not performed when bignums are constructed by binary matching. The consequence is that such matchings can construct oversize bignums that satisfy is_integer/1 yet don't work. Performing arithmetic such as Term - 0 fails with a system_limit exception. Worse, performing a logical operation such as Term band Term results in []. The latter occurs because the size checking (e.g. in erts_band()) is a simple ASSERT(is_not_nil(...)) on the result of the bignum operation, which internally is [] (NIL) in the case of oversize results. However, ASSERT is a no-op in release builds, so the error goes unnoticed and [] is returned as the result of the band/2. This patch addresses this by preventing oversize bignums from entering the VM via binary matching: - the internal bytes_to_big() procedure is augmented to return NIL for oversize results, just like big_norm() - callers of bytes_to_big() are augmented to check for NIL returns and signal errors in those cases - erts_bs_get_integer_2() can only fail with badmatch, so that is the Erlang-level result of oversize bignums from binary matches - big_SUITE.erl is extended with a test case that fails without this fix (no error signalled) and passes with it (badmatch occurs) Credit goes to Nico Kruber for the initial bug report.
2014-12-05erts: Use linear search for small select_val arraysBjörn-Egil Dahlberg
For searching a key in an array we use linear search in arrays up to 10 elements. Selecting on tuple arity will always use linear search. Instead of using two different instructions we assume selecting on different tuple arities are always few in numbers.
2014-08-28Merge branch 'maint'Rickard Green
* maint: add enif_schedule_nif() to NIF API
2014-08-28add enif_schedule_nif() to NIF APISteve Vinoski
In the #erlang IRC channel Anthony Ramine once mentioned the idea of allowing a NIF to use an emulator trap, similar to a BIF trap, to schedule another NIF for execution. This is exactly how dirty NIFs were implemented for Erlang/OTP 17.0, so this commit refactors and generalizes that dirty NIF code to support a new enif_schedule_nif() API function. The enif_schedule_nif() function allows a long-running NIF to be broken into separate NIF invocations. The NIF first executes part of the long-running task, then calls enif_schedule_nif() to schedule a NIF for later execution to continue the task. Any number of NIFs can be scheduled in this manner, one after another. Since the emulator regains control between invocations, this helps avoid problems caused by native code tying up scheduler threads for too long. The enif_schedule_nif() function also replaces the original experimental dirty NIF API. The function takes a flags parameter that a caller can use to indicate the NIF should be scheduled onto either a dirty CPU scheduler thread, a dirty I/O scheduler thread, or scheduled as a regular NIF on a regular scheduler thread. With this change, the original experimental enif_schedule_dirty_nif(), enif_schedule_dirty_nif_finalizer() and enif_dirty_nif_finalizer() API functions are no longer needed and have been removed. Explicit scheduling of a dirty NIF finalization function is no longer necessary; if an application wants similar functionality, it can have a dirty NIF just invoke enif_schedule_nif() to schedule a non-dirty NIF to complete its task. Lift the restriction that dirty NIFs can't call enif_make_badarg() to raise an exception. This was a problem with the original dirty NIF API because it forced developers to get and check all incoming arguments in a regular NIF, and then schedule the dirty NIF which then had to get all the arguments again. Now, the argument checking can be done in the dirty NIF and it can call enif_make_badarg() itself to flag incorrect arguments. Extend the ErlNifFunc struct with a new flags field that allows NIFs to be declared as dirty. The default value for this field is 0, indicating a regular NIF, so it's backwards compatible with all existing statically initialized ErlNifFunc struct instances, and so such instances require no code changes. Defining the flags field with a value of ERL_NIF_DIRTY_JOB_CPU_BOUND indicates that the NIF should execute on a dirty CPU scheduler thread, or defining it with a value of ERL_NIF_DIRTY_JOB_IO_BOUND indicates that the NIF should execute on a dirty I/O scheduler thread. Any other flags field value causes a NIF library loading error. Extend the ErlNifEntry struct with a new options field that indicates whether a NIF library was built with support for optional features such as dirty NIFs. When a NIF library is loaded, the runtime checks the options field to ensure compatibility. If a NIF library built with dirty NIF support is loaded into a runtime that does not support dirty NIFs, and the library defines one or more ErlNifFunc entries with non-zero flags fields indicating dirty NIFs, a NIF library loading error results. There is no error if a NIF library built with dirty NIF support is loaded into a runtime that does not support dirty NIFs but the library does not have any dirty NIFs. It is also not an error if a library without dirty NIF support is loaded into a runtime built with dirty NIF support. Add documentation and tests for enif_schedule_nif().