From 773ded1a02898c4a1f9f395d8df1b25444631706 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Gustavsson?= <bjorn@erlang.org>
Date: Fri, 27 Oct 2017 13:07:13 +0200
Subject: Document beam_makeops

---
 erts/emulator/internal_doc/beam_makeops.md | 1846 ++++++++++++++++++++++++++++
 1 file changed, 1846 insertions(+)
 create mode 100644 erts/emulator/internal_doc/beam_makeops.md

(limited to 'erts/emulator/internal_doc/beam_makeops.md')

diff --git a/erts/emulator/internal_doc/beam_makeops.md b/erts/emulator/internal_doc/beam_makeops.md
new file mode 100644
index 0000000000..1da8d2ab05
--- /dev/null
+++ b/erts/emulator/internal_doc/beam_makeops.md
@@ -0,0 +1,1846 @@
+The beam\_makeops script
+=======================
+
+This document describes the **beam\_makeops** script.
+
+Introduction
+------------
+
+The **beam\_makeops** Perl script is used at build-time by both the
+compiler and runtime system.  Given a number of input files (all with
+the extension `.tab`), it will generate source files used by the
+Erlang compiler and by the runtime system to load and execute BEAM
+instructions.
+
+Essentially those `.tab` files define:
+
+* External generic BEAM instructions.  They are the instructions that
+are known to both the compiler and the runtime system.  Generic
+instructions are stable between releases.  New generic instructions
+with high numbers than previous instructions can be added in major
+releases.  The OTP 20 release has 159 external generic instructions.
+
+* Internal generic instructions.  They are known only to the runtime
+system and can be changed at any time without compatibility issues.
+They are created by transformation rules (described next).
+
+* Rules for transforming one or more generic instructions to other
+generic instructions.  The transformation rules allow combining,
+splitting, and removal of instructions, as well as shuffling operands.
+Because of the transformation rules, the runtime can have many
+internal generic instructions that are only known to runtime system.
+
+* Specific BEAM instructions.  The specific instructions are the
+instructions that are actually executed by the runtime system.  They
+can be changed at any time without causing compatibility issues.
+The loader translates generic instructions to specific instructions.
+In general, for each generic instruction, there exists a family of
+specific instructions.  The OTP 20 release has 389 specific
+instructions.
+
+* The implementation of specific instructions.
+
+Generic instructions have typed operands. Here are a few examples of
+operands for `move/2`:
+
+    {move,{atom,id},{x,5}}.
+    {move,{x,3},{x,0}}.
+    {move,{x,2},{y,1}}.
+
+When those instructions are loaded, the loader rewrites them
+to specific instructions:
+
+    move_cx id 5
+    move_xx 3 0
+    move_xy 2 1
+
+Corresponding to each generic instruction, there is a family of
+specific instructions.  The types that an instance of a specific
+instruction can handle are encoded in the instruction names.  For
+example, `move_xy` takes an X register number as the first operand and
+a Y register number as the second operand.  `move_cx` takes a tagged
+Erlang term as the first operand and an X register number as the
+second operand.
+
+An example: the move instruction
+--------------------------------
+
+Using the `move` instruction as an example, we will give a quick
+tour to show the main features of **beam\_makeops**.
+
+In the `compiler` application, in the file `genop.tab`, there is the
+following line:
+
+    64: move/2
+
+This is a definition of an external generic BEAM instruction. Most
+importantly it specifices that the opcode is 64.  It also defines that
+it has two operands.  The BEAM assembler will use the opcode when
+creating `.beam` files.  The compiler does not really need the arity,
+but it will use it as an internal sanity check when assembling the
+BEAM code.
+
+Let's have a look at `ops.tab` in `erts/emulator/beam`, where the
+specific `move` instructions are defined.  Here are a few of them:
+
+    move x x
+    move x y
+    move c x
+
+Each specific instructions is defined by following the name of the
+instruction with the types for each operand.  An operand type is a
+single letter.  For example, `x` means an X register, `y`
+means a Y register, and `c` is a "constant" (a tagged term such as
+an integer, an atom, or a literal).
+
+Now let's look at the implementation of the `move` instruction.  There
+are multiple files containing implementations of instructions in the
+`erts/emulator/beam` directory.  The `move` instruction is defined in
+`instrs.tab`.  It looks like this:
+
+    move(Src, Dst) {
+        $Dst = $Src;
+    }
+
+The implementation for an instruction largely follows the C syntax,
+except that the variables in the function head don't have any types.
+The `$` before an identifier denotes a macro expansion.  Thus,
+`$Src` will expand to the code to pick up the source operand for
+the instruction and `$Dst` to the code for the destination register.
+
+We will look at the code for each specific instruction in turn.  To
+make the code easier to understand, let's first look at the memory
+layout for the instruction `{move,{atom,id},{x,5}}`:
+
+         +--------------------+--------------------+
+    I -> |                 40 |       &&lb_move_cx |
+         +--------------------+--------------------+
+         |                        Tagged atom 'id' |
+         +--------------------+--------------------+
+
+This example and all other examples in the document assumes a 64-bit
+archictecture, and furthermore that pointers to C code fit in 32 bits.
+
+`I` in the BEAM virtual machine is the instruction pointer.  When BEAM
+executes an instruction, `I` points to the first word of the
+instruction.
+
+`&&lb_move_cx` is the address to C code that implements `move_cx`.  It
+is stored in the lower 32 bits of the word.  In the upper 32 bits is
+the byte offset to the X register; the register number 5 has been
+multiplied by the word size size 8.
+
+In the next word the tagged atom `id` is stored.
+
+With that background, we can look at the generated code for `move_cx`
+in `beam_hot.h`:
+
+    OpCase(move_cx):
+    {
+      BeamInstr next_pf = BeamCodeAddr(I[2]);
+      xb(BeamExtraData(I[0])) = I[1];
+      I += 2;
+      ASSERT(VALID_INSTR(next_pf));
+      GotoPF(next_pf);
+    }
+
+We will go through each line in turn.
+
+* `OpCase(move_cx):` defines a label for the instruction.  The
+`OpCase()` macro is defined in `beam_emu.c`.  It will expand this line
+to `lb_move_cx:`.
+
+* `BeamInstr next_pf = BeamCodeAddr(I[2]);` fetches the pointer to
+code for the next instruction to be executed.  The `BeamCodeAddr()`
+macro extracts the pointer from the lower 32 bits of the instruction
+word.
+
+* `xb(BeamExtraData(I[0])) = I[1];` is the expansion of `$Dst = $Src`.
+`BeamExtraData()` is a macro that will extract the upper 32 bits from
+the instruction word.  In this example, it will return 40 which is the
+byte offset for X register 5.  The `xb()` macro will cast a byte
+pointer to an `Eterm` pointer and dereference it.  The `I[1]` on
+the right side of the `=` fetches an Erlang term (the atom `id` in
+this case).
+
+* `I += 2` advances the instruction pointer to the next
+instruction.
+
+* In a debug-compiled emulator, `ASSERT(VALID_INSTR(next_pf));` makes
+sure that `next_pf` is a valid instruction (that is, that it points
+within the `process_main()` function in `beam_emu.c`).
+
+* `GotoPF(next_pf);` transfers control to the next instruction.
+
+Now let's look at the implementation of `move_xx`:
+
+    OpCase(move_xx):
+    {
+      Eterm tmp_packed1 = BeamExtraData(I[0]);
+      BeamInstr next_pf = BeamCodeAddr(I[1]);
+      xb((tmp_packed1>>BEAM_TIGHT_SHIFT)) = xb(tmp_packed1&BEAM_TIGHT_MASK);
+      I += 1;
+      ASSERT(VALID_INSTR(next_pf));
+      GotoPF(next_pf);
+    }
+
+We will go through the lines that are new or have changed compared to
+`move_cx`.
+
+* `Eterm tmp_packed1 = BeamExtraData(I[0]);` picks up both X register
+numbers packed into the upper 32 bits of the instruction word.
+
+* `BeamInstr next_pf = BeamCodeAddr(I[1]);` pre-fetches the address of
+the next instruction. Note that because both X registers operands fits
+into the instruction word, the next instruction is in the very next
+word.
+
+* `xb((tmp_packed1>>BEAM_TIGHT_SHIFT)) = xb(tmp_packed1&BEAM_TIGHT_MASK);`
+copies the source to the destination.  (For a 64-bit architecture,
+`BEAM_TIGHT_SHIFT` is 16 and `BEAM_TIGHT_MASK` is `0xFFFF`.)
+
+* `I += 1;` advances the instruction pointer to the next instruction.
+
+`move_xy` is almost identical to `move_xx`.  The only difference is
+the use of the `yb()` macro instead of `xb()` to reference the
+destination register:
+
+    OpCase(move_xy):
+    {
+      Eterm tmp_packed1 = BeamExtraData(I[0]);
+      BeamInstr next_pf = BeamCodeAddr(I[1]);
+      yb((tmp_packed1>>BEAM_TIGHT_SHIFT)) = xb(tmp_packed1&BEAM_TIGHT_MASK);
+      I += 1;
+      ASSERT(VALID_INSTR(next_pf));
+      GotoPF(next_pf);
+    }
+
+### Transformation rules ###
+
+Next let's look at how we can do some optimizations using transformation
+rules.  For simple instructions such as `move/2`, the instruction dispatch
+overhead can be substantial.  A simple optimization is to combine common
+instructions sequences to a single instruction.  One such common sequence
+is multiple `move` instructions moving X registers to Y registers.
+
+Using the following rule we can combine two `move` instructions
+to a `move2` instruction:
+
+    move X1=x Y1=y | move X2=x Y2=y => move2 X1 Y1 X2 Y2
+
+The left side of the arrow (`=>`) is a pattern.  If the pattern
+matches, the matching instructions will be replaced by the
+instructions on the right side.  Variables in a pattern must start
+with an uppercase letter just as in Erlang.  A pattern variable may be
+followed `=` and one or more type letters to constrain the match to
+one of those types.  The variables that are bound on the left side can
+be used on the right side.
+
+We will also need to define a specific instruction and an implementation:
+
+    # In ops.tab
+    move2 x y x y
+
+    // In instrs.tab
+    move2(S1, D1, S2, D2) {
+        Eterm V1, V2;
+        V1 = $S1;
+        V2 = $S2;
+        $D1 = V1;
+        $D2 = V2;
+    }
+
+When the loader has found a match and replaced the matched instructions,
+it will match the new instructions against the transformation rules.
+Because of that, we can define the rule for a `move3/6` instruction
+as follows:
+
+    move2 X1=x Y1=y X2=x Y2=y | move X3=x Y3=y => \
+          move3 X1 Y1 X2 Y2 X3 Y3
+
+(A `\` before a newline can be used to break a long line for readability.)
+
+It would also be possible to define it like this:
+
+    move X1=x Y1=y | move X2=x Y2=y | move X3=x Y3=y => \
+         move3 X1 Y1 X2 Y2 X3 Y3
+
+but in that case it must be defined before the rule for `move2/4`
+because the first matching rule will be applied.
+
+One must be careful not to create infinite loops.  For example, if we
+for some reason would want to reverse the operand order for the `move`
+instruction, we must not do like this:
+
+    move Src Dst => move Dst Src
+
+The loader would swap the operands forever.  To avoid the loop, we must
+rename the instruction.  For example:
+
+    move Src Dst => assign Dst Src
+
+This concludes the quick tour of the features of **beam\_makeops**.
+
+Short overview of instruction loading
+-------------------------------------
+
+To give some background to the rest of this document, here follows a
+quick overview of how instructions are loaded.
+
+* The loader reads and decodes one instruction at a time from the BEAM
+code and creates a generic instruction.  Many transformation rules
+must look at multiple instructions, so the loader will
+keep multiple generic instructions in a linked list.
+
+* The loader tries to apply transformation rules against the
+generic instructions in the linked list.  If a rule matches, the
+matched instructions will be removed and replaced with new
+generic instructions constructed from the right side of the
+transformation.
+
+* If a transformation rule matched, the loader applies the
+transformation rules again.
+
+* If no transformation rule match, the loader will begin rewriting
+the first of generic instructions to a specific instruction.
+
+* First the loader will search for a specific operation where the
+types for all operands match the type for the generic instruction.
+The first matching instruction will be selected.  **beam\_makeops**
+has ordered the specific instructions so that instructions with more
+specific operands comes before instructions with less specific
+operands.  For example, `move_nx` is more specific than `move_cx`.  If
+the first operand is `[]` (NIL), `move_nx` will be selected.
+
+* Given the opcode for the selected specific instruction, the loader
+looks up the pointer to the C code for the instruction and stores
+in the code area for the module being loaded.
+
+* The loader translates each operand to a machine word and stores it
+in the code area.  The operand type for the selected specific
+instruction guides the translation.  For example, if the type is `e`,
+the value of the operand is an index into an arry of external
+functions and will be translated to a pointer to the export entry for
+the function to call.  If the type is `x`, the number of the X
+register will be multiplied by the word size to produce a byte offset.
+
+* The loader runs the packing engine to pack multiple operands into a
+single word.  The packing engine is controlled by a small program,
+which is a string where each character is an instruction.  For
+example, the code to pack the operands for `move_xy` is `"22#"` (on a
+64-bit machine).  That program will pack the byte offsets for both
+registers into the same word as the pointer to C code.
+
+Running beam_makeops
+--------------------
+
+**beam\_makeops** is found in `$ERL_TOP/erts/emulator/utils`.  Options
+start with a hyphen (`-`).  The options are followed by the name of
+the input files.  By convention, all input files have the extension
+`.tab`, but is not enforced by **beam\_makeops**.
+
+### The -outdir option ###
+
+The option `-outdir Directory` specifies the output directory for
+the generated files.  Default is the current working directory.
+
+### Running beam_makeops for the compiler ###
+
+Give the option `-compiler` to produce output files for the compiler.
+The following files will be written to the output directory:
+
+* `beam_opcodes.erl` - Used primarily by `beam_asm` and `beam_diasm`.
+
+* `beam_opcode.hrl` - Used by `beam_asm`.  It contains tag definitions
+used for encoding instruction operands.
+
+The input file should only contain the definition of BEAM_FORMAT_NUMBER
+and external generic instructions.  (Everything else would be ignored.)
+
+### Running beam_makeops for the emulator ###
+
+Give the option `-emulator` to produce output files for the emulator.
+The following output files will be generated in the output directory.
+
+* `beam_hot.h`, `beam_warm.h`, `beam_cold.`h - Implementation of
+instructions.  Included inside the `process_main()` function in
+`beam_emu.c`.
+
+* `beam_opcodes.c` - Defines static data used by the loader
+(`beam_load.c`).  Data about generic instructions, specific
+instructions (including how to pack their operands), and
+transformation rules are all part of this file.
+
+* `beam_opcodes.h` - Miscellanous preprocessor definitions, mainly
+used by `beam_load.c` but also by `beam_{hot,warm,cold}.h`.
+
+* `beam_pred_funcs.h` - Included by `beam_load.c`.  Contains defines
+needed to call guard constraints in transformation rules.
+
+* `beam_tr_funcs.h` - Included by `beam_load.c`.  Contains defines
+needed to call a C function to the right of a transformation rule.
+
+The following options can be given:
+
+* `wordsize 32|64` - Defines the word size.  Default is 32.
+
+* `code-model Model` - The code model as given to `-mcmodel` option
+for GCC.  Default is `unknown`.  If the code model is `small` (and
+the word size is 64 bits), **beam\_makeops** will pack operands
+into the upper 32 bits of the instruction word.
+
+* `DSymbol=0|1` - Defines the value for a symbol.  The symbol can be
+used in `%if` and `%unless` directives.
+
+Syntax of .tab files
+--------------------
+
+### Comments ###
+
+Any line starting with `#` is a comment and is ignored.
+
+A line with `//` is also a comment.  It is recommended to only
+use this style of comments in files that define implementations of
+instructions.
+
+A long line can be broken into shorter lines by a placing a`\` before
+the newline.
+
+### Variable definitions ###
+
+A variable definition binds a variable to a Perl variable.  It is only
+meaningful to add a new definition if **beam\_makeops** is updated
+at the same time to use the variable.  A variable definition looks this:
+
+*name*=*value*[;]
+
+where *name* is the name of a Perl variable in **beam\_makeops**,
+and *value* is the value to be given to the variable.  The line
+can optionally end with a `;` (to avoid messing up the
+C indentation mode in Emacs).
+
+Here follows a description of the variables that are defined.
+
+#### BEAM\_FORMAT\_NUMBER ####
+
+`genop.tab` has the following definition:
+
+    BEAM_FORMAT_NUMBER=0
+
+It defines the version of the instruction set (which will be
+included in the code header in the BEAM code).  Theoretically,
+the version could be bumped, and all instructions changed.
+In practice, we would have two support two instruction sets
+in the runtime system for at least two releases, so it will
+probably never happen in practice.
+
+#### GC\_REGEXP ####
+
+In `macros.tab`, there is a definition of `GC_REGEXP`.
+It will be described in [a later section](#the-gc_regexp-definition).
+
+### Directives ###
+
+There are directives to classify specific instructions depending
+on how frequently used they are:
+
+* `%hot` - Implementation will be placed in `beam_hot.h`. Frequently
+executed instructions.
+
+* `%warm` - Implementation will be placed in `beam_warm.h`.  Binary
+syntax instructions.
+
+* `%cold` - Implementation will be placed in `beam_cold.h`. Trace
+instructions and infrequently used instructions.
+
+Default is `%hot`.  The directives will be applied to declarations
+of the specific instruction that follow.  Here is an example:
+
+    %cold
+    is_number f? xy
+    %hot
+
+#### Conditional compilation directives ####
+
+The `%if` directive includes a range of lines if a condition is
+true.  For example:
+
+    %if ARCH_64
+    i_bs_get_integer_32 x f? x
+    %endif
+
+The specific instruction `i_bs_get_integer_32` will only be defined
+on a 64-bit machine.
+
+The condition can be inverted by using `%unless` instead of `%if`:
+
+    %unless NO_FPE_SIGNALS
+    fcheckerror p => i_fcheckerror
+    i_fcheckerror
+    fclearerror
+    %endif
+
+It is also possible to add an `%else` clause:
+
+    %if ARCH_64
+    BS_SAFE_MUL(A, B, Fail, Dst) {
+        Uint64 res = ($A) * ($B);
+        if (res / $B != $A) {
+            $Fail;
+        }
+        $Dst = res;
+    }
+    %else
+    BS_SAFE_MUL(A, B, Fail, Dst) {
+        Uint64 res = (Uint64)($A) * (Uint64)($B);
+        if ((res >> (8*sizeof(Uint))) != 0) {
+            $Fail;
+        }
+        $Dst = res;
+    }
+    %endif
+
+#### Symbols that are defined in directives ####
+
+The following symbols are always defined.
+
+* `ARCH_64` - is 1 for a 64-bit machine, and 0 otherwise.
+* `ARCH_32` - is 1 for 32-bit machine, and 1 otherwise.
+
+The `Makefile` for building the emulator currently defines the
+following symbols by using the `-D` option on the command line for
+**beam\_makeops**.
+
+* `NO_FPE_SIGNALS` - 1 if FPE signals are not enable in runtime system,
+0 otherwise.
+* `USE_VM_PROBES` - 1 if the runtime system is compiled to use VM probes (support for dtrace or systemtap), 0 otherwise.
+
+### Defining external generic instructions ###
+
+External generic BEAM instructions are known to both the compiler and
+the runtime system.  They remain stable between releases.  A new major
+release may add more external generic instructions, but must not change
+the semantics for a previously defined instruction.
+
+The syntax for an external generic instruction is as follows:
+
+*opcode*: [-]*name*/*arity*
+
+*opcode* is an integer greater than or equal to 1.
+
+*name* is an identifier starting with a lowercase letter.  *arity* is
+an integer denoting the number of operands.
+
+*name* can optionally be preceded by `-` to indicate that it has been
+obsoleted.  The compiler is not allowed to generate BEAM files that
+use obsolete instructions and the loader will refuse to load BEAM
+files that use obsolete instructions.
+
+It only makes sense to define external generic instructions in the
+file `genop.tab` in `lib/compiler/src`, because the compiler must
+know about them in order to use them.
+
+New instructions must be added at the end of the file, with higher
+numbers than the previous instructions.
+
+### Defining internal generic instructions ###
+
+Internal generic instructions are known only to the runtime
+system and can be changed at any time without compatibility issues.
+
+There are two ways to define internal generic instructions:
+
+* Implicitly when a specific instruction is defined.  This is by far
+the most common way.  Whenever a specific instruction is created,
+**beam\_makeops** automatically creates an internal generic instruction
+if it does not previously exist.
+
+* Explicitly.  This is necessary only when a generic instruction does
+not have any corresponding specific instruction.
+
+The syntax for an internal generic instruction is as follows:
+
+*name*/*arity*
+
+*name* is an identifier starting with a lowercase letter.  *arity* is
+an integer denoting the number of operands.
+
+### About generic instructions in general ###
+
+Each generic instruction has an opcode.  The opcode is an integer,
+greater than or equal to 1. For an external generic instruction, it
+must be explicitly given `genop.tab`, while internal generic
+instructions are automatically numbered by **beam\_makeops**.
+
+The identity of a generic instruction is its name combined with its
+arity.  That means that it is allowed to define two distinct generic
+instructions having the same name but with different arities.  For
+example:
+
+    move_window/5
+    move_window/6
+
+Each operand of a generic instruction is tagged with its type.  A generic
+instruction can have one of the following types:
+
+* `x` - X register.
+
+* `y` - Y register.
+
+* `l` - Floating point register number.
+
+* `i` - Tagged literal integer.
+
+* `a` - Tagged literal atom.
+
+* `n` - NIL (`[]`, the empty list).
+
+* `q` - Literal that don't fit in a word, that is an object stored on
+the heap such as a list or tuple.  Any heap object type is supported,
+even types that don't have real literals such as external references.
+
+* `f` - Non-zero failure label.
+
+* `p` - Zero failure label.
+
+* `u` - Untagged integer that fits in a machine word.  It is used for many
+different purposes, such as the number of live registers in `test_heap/2`,
+as a reference to the export for `call_ext/2`, and as the flags operand for
+binary syntax instructions.  When the generic instruction is translated to a
+specific instruction, the type for the operand in the specific operation will
+tell the loader how to treat the operand.
+
+* `o` - Overflow.  If the value for an `u` operand does not fit in a machine
+word, the type of the operand will be changed to `o` (with no associated
+value).  Currently only used internally in the loader in the guard constraint
+function `binary_too_big()`.
+
+* `v` - Arity value.  Only used internally in the loader.
+
+
+### Defining specific instructions ###
+
+The specific instructions are known only to the runtime system and
+are the instructions that are actually executed.  They can be changed
+at any time without causing compatibility issues.
+
+A specific instruction can have at most 6 operands.
+
+A specific instruction is defined by first giving its name followed by
+the types for each operand.  For example:
+
+     move x y
+
+Internally, for example in the generated code and in the output from
+the BEAM disassembler, the instruction `move x y` will be called `move_xy`.
+
+The name for a specific instruction is an identifier starting with a
+lowercase letter.  A type is an lowercase or uppercase letter.
+
+All specific instructions with a given name must have the same number
+of operands. That is, the following is **not** allowed:
+
+     move x x
+     move x y x y
+
+Here follows the type letters that more or less directly corresponds
+to the types for generic instructions.
+
+* `x` - X register.  Will be loaded as a byte offset to the X register
+relative to the base of X register array.  (Can be packed with other
+operands.)
+
+* `y` - Y register.  Will be loaded as a byte offset to the Y register
+relative to the stack frame. (Can be packed with other operands.)
+
+* `r` - X register 0.  An implicit operand that will not be stored in
+the loaded code.
+
+* `l` - Floating point register number.  (Can be packed with other
+operands.)
+
+* `i` - Tagged literal integer (a SMALL that will fit in one word).
+
+* `a` - Tagged atom.
+
+* `n` - NIL or the empty list.  (Will not be stored in the loaded code.)
+
+* `q` - Tagged CONS or BOXED pointer.  That is, a term such as a list
+or tuple.  Any heap object type is supported, even types that don't
+have real literals such as external references.
+
+* `f` - Failure label (non-zero).  The target for a branch
+or call instruction.
+
+* `p` - The 0 failure label, meaning that an exception should be raised
+if the instruction fails.  (Will not be stored in the loaded code.)
+
+* `c` - Any literal term; that is, immediate literals such as SMALL,
+and CONS or BOXED pointers to literals.  (Can be used where the
+operand in the generic instruction has one of the types `i`, `a`, `n`,
+or `q`.)
+
+The types that follow do a type test of the operand at runtime; thus,
+they are generally more expensive in terms of runtime than the types
+described earlier.  However, those operand types are needed to avoid a
+combinatorial explosion in the number of specific instructions and
+overall code size of `process_main()`.
+
+* `s` - Tagged source: X register, Y register, or a literal term.  The
+tag will be tested at runtime to retrieve the value from an X
+register, a Y register, or simply use the value as a tagged Erlang
+term.  (Implementation note: An X register is tagged as a pid, and a Y
+register as a port.  Therefore the literal term must not contain a
+port or pid.)
+
+* `S` - Tagged source register (X or Y).  The tag will be tested at
+runtime to retrieve the value from an X register or a Y register.  Slighly
+cheaper than `s`.
+
+* `d` - Tagged destination register (X or Y).  The tag will be tested
+at runtime to set up a pointer to the destination register.  If the
+instrution performs a garbarge collection, it must use the
+`$REFRESH_GEN_DEST()` macro to refresh the pointer before storing to
+it (there are more details about that in a later section).
+
+* `j` - A failure label (combination of `f` and `p`).  If the branch target 0,
+an exception will be raised if instruction fails, otherwise control will be
+transfered to the target address.
+
+The types that follows are all applied to an operand that has the `u`
+type.
+
+* `t` - An untagged integer that will fit in 12 bits (0-4096).  It can be
+packed with other operands in a word.  Most often used as the number
+of live registers in instructions such as `test_heap`.
+
+* `I` - An untagged integer that will fit in 32 bits.  It can be
+packed with other operands in a word on a 64-bit system.
+
+* `W` - Untagged integer or pointer.  Not possible to pack with other
+operands.
+
+* `e` - Pointer to an export entry.  Use by call instructions that call
+other modules, such as `call_ext`.
+
+* `L` - A label.  Only used by the `label/1` instruction.
+
+* `b` - Pointer to BIF.  Used by instructions that BIFs, such as
+`call_bif`.
+
+* `A` - A tagged arityvalue.  Used in instructions that test the arity
+of a tuple.
+
+* `P` - A byte offset into a tuple.
+
+* `Q` - A byte offset into the stack.  Used for updating the frame
+pointer register.  Can be packed with other operands.
+
+When the loader translates a generic instruction a specific
+instruction, it will choose the most specific instruction that will
+fit the types.  Consider the following two instructions:
+
+    move c x
+    move n x
+
+The `c` operand can encode any literal value, including NIL.  The
+`n` operand only works for NIL.  If we have the generic instruction
+`{move,nil,{x,1}}`, the loader will translate it to `move_nx 1`
+because `move n x` is more specific.  `move_nx` could be slightly
+faster or smaller (depending on the architecture), because the `[]`
+is not stored explicitly as an operand.
+
+#### Syntactic sugar for specific instructions ####
+
+It is possible to specify more than one type letter for each operand.
+Here is an example:
+
+    move cxy xy
+
+This is syntactic sugar for:
+
+    move c x
+    move c y
+    move x x
+    move x y
+    move y x
+    move y y
+
+Note the difference between `move c xy` and `move c d`.  Note that `move c xy`
+is equivalent to the following two definitions:
+
+    move c x
+    move c y
+
+On the other hand, `move c d` is a single instruction.  At runtime,
+the `d` operand will be tested to see whether it refers to an X
+register or a Y register, and a pointer to the register will be set
+up.
+
+#### The '?' type modifier ####
+
+The character `?` can be added to the end of an operand to indicate
+that the operand will not be used every time the instruction is executed.
+For example:
+
+    allocate_heap t I t?
+    is_eq_exact f? x xy
+
+In `allocate_heap`, the last operand is the number of live registers.
+It will only be used if there is not enough heap space and a garbage
+collection must be performed.
+
+In `is_eq_exact`, the failure address (the first operand) will only be
+used if the two register operands are not equal.
+
+Knowing that an operand is not always used can improve how packing
+is done for some instructions.
+
+For the `allocate_heap` instruction, without the `?` the packing would
+be done like this:
+
+         +--------------------+--------------------+
+    I -> |       Stack needed | &&lb_allocate_heap +
+         +--------------------+--------------------+
+         |        Heap needed | Live registers     +
+         +--------------------+--------------------+
+
+"Stack needed" and "Heap needed" are always used, but they are in
+different words.  Thus, at runtime the `allocate_heap` instruction
+must read both words from memory even though it will not always use
+"Live registers".
+
+With the `?`, the operands will be packed like this:
+
+         +--------------------+--------------------+
+    I -> |     Live registers | &&lb_allocate_heap +
+         +--------------------+--------------------+
+         |        Heap needed |       Stack needed +
+         +--------------------+--------------------+
+
+Now "Stack needed" and "Heap needed" are in the same word.
+
+### Defining transformation rules ###
+
+Transformation rules are used to rewrite generic instructions to other
+generic instructions.  The transformations rules are applied
+repeatedly until no rule match.  At that point, the first instruction
+in the resulting instruction sequence will be converted to a specific
+instruction and added to the code for the module being loaded.  Then
+the transformation rules for the remaining instructions are run in the
+same way.
+
+A rule is recognized by its right-pointer arrow: `=>`.  To the left of
+the arrow is one or more instruction patterns, separated by `|`.  To
+the right of the arrow is zero or more instructions, separated by `|`.
+If the instructions from the BEAM code matches the instruction
+patterns on the left side, they will be replaced with instructions on
+the right side (or removed if there are no instructions on the right).
+
+#### Defining instruction patterns ####
+
+We will start looking at the patterns on the left side of the arrow.
+
+A pattern for an instruction consists of its name, followed by a pattern
+for each of its operands.  The operand patterns are separated by spaces.
+
+The simplest possible pattern is a variable.  Just like in Erlang,
+a variable must begin with an uppercase letter.  If the same variable is
+used in multiple operands, the pattern will only match if the operands
+are equal.  For example:
+
+    move Same Same =>
+
+This pattern will match if the operands for `move` are the same.  If
+the pattern match, the instruction will be removed.  (That used to be an
+actual rule a long time ago when the compiler would occasionally produce
+instructions such as `{move,{x,2},{x,2}}`.)
+
+Variables that have been bound on the left side can be used on the
+right side.  For example, this rule will rewrite all `move` instructions
+to `assign` instructions with the operands swapped:
+
+    move Src Dst => assign Dst Src
+
+If we only want to match operands of a certain type, we can
+use a type constraint.  A type constraint consists of one or more
+lowercase letters, each specifying a type.  For example:
+
+    is_integer Fail an => jump Fail
+
+The second operand pattern, `an`, will match if the second operand is
+either an atom or NIL (the empty list).  In case of a match, the
+`is_integer/2` instruction will be replaced with a `jump/1`
+instruction.
+
+An operand pattern can bind a variable and constrain the type at the
+same time by following the variable with a `=` and the constraint.
+For example:
+
+    is_eq_exact Fail=f R=xy C=q => i_is_eq_exact_literal Fail R C
+
+Here the `is_eq_exact` instruction is replaced with a specialized instruction
+that only compares literals, but only if the first operand is a register and
+the second operand is a literal.
+
+#### Further constraining patterns ####
+
+In addition to specifying a type letter, the actual value for the type can
+be specified.  For example:
+
+    move C=c x==1 => move_x1 C
+
+Here the second operand of `move` is constrained to be X register 1.
+
+When specifying an atom constraint, the atom is written as it would be
+in the C source code.  That is, it needs an `am_` prefix, and it must
+be listed in `atom.names`.  For example:
+
+    is_boolean Fail=f a==am_true =>
+    is_boolean Fail=f a==am_false =>
+
+There are several constraints available for testing whether a call is to a BIF
+or a function.
+
+The constraint `u$is_bif` will test whether the given operand refers to a BIF.
+For example:
+
+    call_ext u Bif=u$is_bif => call_bif Bif
+    call_ext u Func         => i_call_ext Func
+
+The `call_ext` instruction can be used to call functions written in
+Erlang as well as BIFs (or more properly called SNIFs).  The
+`u$is_bif` constraint will match if the operand refers to a BIF (that
+is, if it is listed in the file `bif.tab`).  Note that `u$is_bif`
+should only be applied to operands that are known to contain an index
+to the import table chunk in the BEAM file (such operands have the
+type `b` or `e` in the corresponding specific instruction).  If
+applied to other `u` operands, it will at best return a nonsense
+result.
+
+The `u$is_not_bif` constraint matches if the operand does not refer to
+a BIF (not listed in `bif.tab`).  For example:
+
+    move S X0=x==0 | line Loc | call_ext_last Ar Func=u$is_not_bif D => \
+         move S X0 | call_ext_last Ar Func D
+
+The `u$bif:Module:Name/Arity` constraint tests whether the given
+operand refers to a specific BIF.  Note that `Module:Name/Arity`
+**must** be an existing BIF defined in `bif.tab`, or there will
+be a compilation error.  It is useful when a call to a specific BIF
+should be replaced with an instruction as in this example:
+
+    gc_bif2 Fail Live u$bif:erlang:splus/2 S1 S2 Dst => \
+         gen_plus Fail Live S1 S2 Dst
+
+Here the call to the GC BIF `'+'/2` will be replaced with the instruction
+`gen_plus/5`.  Note that the same name as used in the C source code must be
+used for the BIF, which in this case is `splus`.  It is defined like this
+in `bit.tab`:
+
+    ubif erlang:'+'/2 splus_2
+
+The `u$func:Module:Name/Arity` will test whether the given operand is a
+a specific function.  Here is an example:
+
+    bif1 Fail u$func:erlang:is_constant/1 Src Dst => too_old_compiler
+
+`is_constant/1` used to be a BIF a long time ago.  The transformation
+replaces the call with the `too_old_compiler` instruction which will produce
+a nicer error message than the default error would be for a missing guard BIF.
+
+#### Type constraints allowed in patterns ####
+
+Here are all type letters that are allowed on the left side of a transformation
+rule.
+
+* `u` - An untagged integer that fits in a machine word.
+
+* `x` - X register.
+
+* `y` - Y register.
+
+* `l` - Floating point register number.
+
+* `i` - Tagged literal integer.
+
+* `a` - Tagged literal atom.
+
+* `n` - NIL (`[]`, the empty list).
+
+* `q` - Literals that don't fit in a word, such as list or tuples.
+
+* `f` - Non-zero failure label.
+
+* `p` - The zero failure label.
+
+* `j` - Any label.  Equivalent to `fp`.
+
+* `c` - Any literal term.  Equivalent to `ainq`.
+
+* `s` - X register, Y register, or any literal term.  Equivalent to `xyc`.
+
+* `d` - X or Y register.  Equivalent to `xy`.  (In a pattern `d` will
+match both source and destination registers.  As an operand in a specific
+instruction, it must only be used for a destination register.)
+
+* `o` - Overflow.  An untagged integer that does not fit in a machine word.
+
+#### Guard constraints ####
+
+If the constraints described so far is not enough, additional
+constraints can be written in C in `beam_load.c` and be called as a
+guard function on the left side of the transformation.  If the guard
+function returns a non-zero value, the matching of the rule will
+continue, otherwise the match will fail.  For example:
+
+    ensure_map Lit=q | literal_is_map(Lit) =>
+
+The guard test `literal_is_map/1` tests whether the given literal is a map.
+If the literal is a map, the instruction is unnecessary and can be removed.
+
+It is outside the scope for this document to describe in detail how such
+guard functions are written, but for the curious here is the implementation
+of `literal_is_map()`:
+
+    static int
+    literal_is_map(LoaderState* stp, GenOpArg Lit)
+    {
+        Eterm term;
+
+        ASSERT(Lit.type == TAG_q);
+        term = stp->literals[Lit.val].term;
+        return is_map(term);
+    }
+
+#### Handling instruction with variable number of operands ####
+
+Some instructions, such as `select_val/3`, essentially has a variable
+number of operands.  Such instructions have a `{list,[...]}` operand
+as their last operand in the BEAM assembly code.  For example:
+
+    {select_val,{x,0},
+                {f,1},
+                {list,[{atom,b},{f,4},{atom,a},{f,5}]}}.
+
+The loader will convert a `{list,[...]}` operand to an `u` operand whose
+value is the number of elements in the list, followed by each element in
+the list.  The instruction above would be translated to the following
+generic instruction:
+
+    {select_val,{x,0},{f,1},{u,4},{atom,b},{f,4},{atom,a},{f,5}}
+
+To match a variable number of arguments we need to use the special
+operand type `*` like this:
+
+    select_val Src=aiq Fail=f Size=u List=* => \
+        i_const_select_val Src Fail Size List
+
+This transformation renames a `select_val/3` instruction
+with a constant source operand to `i_const_select_val/3`.
+
+#### Constructing new instructions on the right side ####
+
+The most common operand on the right side is a variable that was bound while
+matching the left side.  For example:
+
+    trim N Remaining => i_trim N
+
+An operand can also be a type letter to construct an operand of that type.
+Each type has a default value.  For example, the type `x` has the default
+value 1023, which is the highest X register.  That makes `x` on the right
+side a convenient shortcut for a temporary X register.  For example:
+
+    is_number Fail Literal=q => move Literal x | is_number Fail x
+
+If the second operand for `is_number/2` is a literal, it will be moved to
+X register 1023.  Then `is_number/2` will test whether the value stored in
+X register 1023 is a number.
+
+This kind of transformation is useful when it is rare that an operand can
+be anything else but a register.  In the case of `is_number/2`, the second
+operand is always a register unless the compiler optimizations have been
+disabled.
+
+If the default value is not suitable, the type letter can be followed
+by `=` and a value.  Most types take an integer value.  The value for
+an atom is written the same way as in the C source code.  For example,
+the atom `false` is written as `am_false`.  The atom must be listed in
+`atom.names`.
+
+Here is an example showing how values can be specified:
+
+    bs_put_utf32 Fail=j Flags=u Src=s => \
+        i_bs_validate_unicode Fail Src | \
+        bs_put_integer Fail i=32 u=1 Flags Src
+
+#### Type letters on the right side ####
+
+Here follows all types that are allowed to be used in operands for
+instructions being constructed on the right side of a transformation
+rule.
+
+* `u` - Construct an untagged integer.  The default value is 0.
+
+* `x` - X register.  The default value is 1023.  That makes `x` convenient to
+use as a temporary X register.
+
+* `y` - Y register.  The default value is 0.
+
+* `l` - Foating point register number.  The default value is 0.
+
+* `i` - Tagged literal integer.  The default value is 0.
+
+* `a` - Tagged atom.  The default value is the empty atom (`am_Empty`).
+
+* `n` - NIL (`[]`, the empty list).
+
+#### Function call on the right side ####
+
+Transformations that are not possible to describe with the rule
+language as described here can be written as a C function in
+`beam_load.c` and called from the right side of a transformation.  The
+left side of the transformation will perform the match and bind
+operands to variables.  The variables can then be passed to a
+generator function on the right side.  For example:
+
+    bif2 Fail=j u$bif:erlang:element/2 Index=s Tuple=xy Dst=d => \
+        gen_element(Jump, Index, Tuple, Dst)
+
+This transformation rule matches a call to the BIF `element/2`.
+The operands will be captured and the function `gen_element()` will
+be called.
+
+`gen_element()` will produce one of two instructions depending
+on `Index`.  If `Index` is an integer in the range from 1 up to
+the maximum tuple size, the instruction `i_fast_element/2` will
+be produced, otherwise the instruction `i_element/4` will be
+produced.  The corresponding specific instructions are:
+
+    i_fast_element xy j? I d
+    i_element xy j? s d
+
+The `i_fast_element/2` instruction is faster because the tuple is
+already an untagged integer.  It also knows that the index is at least
+1, so it does not have to test for that.  The `i_element/4`
+instruction will have to fetch the index from a register, test that it
+is an integer, and untag the integer.
+
+It is outside the scope of this document to describe in detail how
+generator functions are written, but for the curious, here is the
+implementation of `gen_element()`:
+
+    static GenOp*
+    gen_element(LoaderState* stp, GenOpArg Fail,
+       GenOpArg Index, GenOpArg Tuple, GenOpArg Dst)
+    {
+        GenOp* op;
+
+        NEW_GENOP(stp, op);
+        op->arity = 4;
+        op->next = NULL;
+
+        if (Index.type == TAG_i && Index.val > 0 &&
+           Index.val <= ERTS_MAX_TUPLE_SIZE &&
+           (Tuple.type == TAG_x || Tuple.type == TAG_y)) {
+            op->op = genop_i_fast_element_4;
+            op->a[0] = Tuple;
+            op->a[1] = Fail;
+            op->a[2].type = TAG_u;
+            op->a[2].val = Index.val;
+            op->a[3] = Dst;
+        } else {
+            op->op = genop_i_element_4;
+            op->a[0] = Tuple;
+            op->a[1] = Fail;
+            op->a[2] = Index;
+            op->a[3] = Dst;
+        }
+
+        return op;
+    }
+}
+
+### Defining the implementation ###
+
+The actual implementation of instructions are also defined in `.tab`
+files processed by **beam\_makeops**.  For practical reasons,
+instruction definitions are stored in several files, at the time of
+writing in the following files:
+
+    bif_instrs.tab
+    arith_instrs.tab
+    bs_instrs.tab
+    float_instrs.tab
+    instrs.tab
+    map_instrs.tab
+    msg_instrs.tab
+    select_instrs.tab
+    trace_instrs.tab
+
+There is also a file that only contains macro definitions:
+
+    macros.tab
+
+The syntax of each file is similar to C code.  In fact, most of
+the contents *is* C code, interspersed with macro invocations.
+
+To allow Emacs to auto-indent the code, each file starts with the
+following line:
+
+    // -*- c -*-
+
+To avoid messing up the indentation, all comments are written
+as C++ style comments (`//`) instead of `#`.  Note that a comment
+must start at the beginning of a line.
+
+The meat of an instruction definition file are macro definitions.
+We have seen this macro definition before:
+
+    move(Src, Dst) {
+        $Dst = $Src;
+    }
+
+A macro definitions must start at the beginning of the line (no spaces
+allowed), the opening curly bracket must be on the same line, and the
+finishing curly bracket must be at the beginning of a line.  It is
+recommended that the macro body is properly indented.
+
+As a convention, the macro arguments in the head all start with an
+uppercase letter.  In the body, the macro arguments can be expanded
+by preceding them with `$`.
+
+A macro definition whose name and arity matches a family of
+specific instructions is assumed to be the implementation of that
+instruction.
+
+A macro can also be invoked from within another macro.  For example,
+`move_deallocate_return/2` avoids repeating code by invoking
+`$deallocate_return()` as a macro:
+
+    move_deallocate_return(Src, Deallocate) {
+        x(0) = $Src;
+        $deallocate_return($Deallocate);
+    }
+
+Here is the definition of `deallocate_return/1`:
+
+    deallocate_return(Deallocate) {
+        //| -no_next
+        int words_to_pop = $Deallocate;
+        SET_I((BeamInstr *) cp_val(*E));
+        E = ADD_BYTE_OFFSET(E, words_to_pop);
+        CHECK_TERM(x(0));
+        DispatchReturn;
+    }
+
+The expanded code for `move_deallocate_return` will look this:
+
+    OpCase(move_deallocate_return_cQ):
+    {
+      x(0) = I[1];
+      do {
+        int words_to_pop = Qb(BeamExtraData(I[0]));
+        SET_I((BeamInstr *) cp_val(*E));
+        E = ADD_BYTE_OFFSET(E, words_to_pop);
+        CHECK_TERM(x(0));
+        DispatchReturn;
+      } while (0);
+    }
+
+When expanding macros, **beam\_makeops** wraps the expansion in a
+`do`/`while` wrapper unless **beam\_makeops** can clearly see that no
+wrapper is needed.  In this case, the wrapper is needed.
+
+Note that arguments for macros cannot be complex expressions, because
+the arguments are split on `,`.  For example, the following would
+not work because **beam\_makeops** would split the expression into
+two arguments:
+
+    $deallocate_return(get_deallocation(y, $Deallocate));
+
+#### Code generation directives ####
+
+Within macro definitions, `//` comments are in general not treated
+specially.  They will be copied to the file with the generated code
+along with the rest of code in the body.
+
+However, there is an exception. Within a macro definition, a line that
+starts with whitespace followed by `//|` is treated specially.  The
+rest of the line is assumed to contain directives to control code
+generation.
+
+Currently, two code generation directives are recognized:
+
+* `-no_prefetch`
+* `-no_next`
+
+##### The -no_prefetch directive #####
+
+To see what `-no_prefetch` does, let's first look at the default code
+generation.  Here is the code generated for `move_cx`:
+
+    OpCase(move_cx):
+    {
+      BeamInstr next_pf = BeamCodeAddr(I[2]);
+      xb(BeamExtraData(I[0])) = I[1];
+      I += 2;
+      ASSERT(VALID_INSTR(next_pf));
+      GotoPF(next_pf);
+    }
+
+Note that the very first thing done is to fetch the address to the
+next instruction.  The reason is that it usually improves performance.
+
+Just as a demonstration, we can add a `-no_prefetch` directive to
+the `move/2` instruction:
+
+    move(Src, Dst) {
+        //| -no_prefetch
+        $Dst = $Src;
+    }
+
+We can see that the prefetch is no longer done:
+
+    OpCase(move_cx):
+    {
+      xb(BeamExtraData(I[0])) = I[1];
+      I += 2;
+      ASSERT(VALID_INSTR(*I));
+      Goto(*I);
+    }
+
+When would we want to turn off the prefetch in practice?
+
+In instructions that will not always execute the next instruction.
+For example:
+
+    is_atom(Fail, Src) {
+        if (is_not_atom($Src)) {
+            $FAIL($Fail);
+        }
+    }
+
+    // From macros.tab
+    FAIL(Fail) {
+        //| -no_prefetch
+        $SET_I_REL($Fail);
+        Goto(*I);
+    }
+
+`is_atom/2` may either execute the next instruction (if the second
+operand is an atom) or branch to the failure label.
+
+The generated code looks like this:
+
+    OpCase(is_atom_fx):
+    {
+      if (is_not_atom(xb(I[1]))) {
+        ASSERT(VALID_INSTR(*(I + (fb(BeamExtraData(I[0]))) + 0)));
+        I += fb(BeamExtraData(I[0])) + 0;;
+        Goto(*I);;
+      }
+      I += 2;
+      ASSERT(VALID_INSTR(*I));
+      Goto(*I);
+    }
+
+##### The -no_next directive #####
+
+Next we will look at when the `-no_next` directive can be used.  Here
+is the `jump/1` instruction:
+
+    jump(Fail) {
+        $JUMP($Fail);
+    }
+
+    // From macros.tab
+    JUMP(Fail) {
+        //| -no_next
+        $SET_I_REL($Fail);
+        Goto(*I);
+    }
+
+The generated code looks like this:
+
+    OpCase(jump_f):
+    {
+      ASSERT(VALID_INSTR(*(I + (fb(BeamExtraData(I[0]))) + 0)));
+      I += fb(BeamExtraData(I[0])) + 0;;
+      Goto(*I);;
+    }
+
+If we remove the `-no_next` directive, the code would look like this:
+
+    OpCase(jump_f):
+    {
+      BeamInstr next_pf = BeamCodeAddr(I[1]);
+      ASSERT(VALID_INSTR(*(I + (fb(BeamExtraData(I[0]))) + 0)));
+      I += fb(BeamExtraData(I[0])) + 0;;
+      Goto(*I);;
+      I += 1;
+      ASSERT(VALID_INSTR(next_pf));
+      GotoPF(next_pf);
+    }
+
+In the end, the C compiler will probably optimize this code to the
+same native code as the first version, but the first version is certainly
+much easier to read for human readers.
+
+#### Macros in the macros.tab file ####
+
+The file `macros.tab` contains many useful macros.  When implementing
+new instructions it is good practice to look through `macros.tab` to
+see if any of existing macros can be used rather than re-inventing
+the wheel.
+
+We will describe a few of the most useful macros here.
+
+##### The GC_REGEXP definition #####
+
+The following line defines a regular expression that will recognize
+a call to a function that does a garbage collection:
+
+     GC_REGEXP=erts_garbage_collect|erts_gc|GcBifFunction;
+
+The purpose is that **beam\_makeops** can verify that an instruction
+that does a garbage collection and has an `d` operand uses the
+`$REFRESH_GEN_DEST()` macro.
+
+If you need to define a new function that does garbage collection,
+you should give it the prefix `erts_gc_`.  If that is not possible
+you should update the regular expression so that it will match your
+new function.
+
+##### FAIL(Fail) #####
+
+Branch to `$Fail`.  Will suppress prefetch (`-no_prefetch`).  Typical use:
+
+    is_nonempty_list(Fail, Src) {
+        if (is_not_list($Src)) {
+            $FAIL($Fail);
+        }
+    }
+
+##### JUMP(Fail) #####
+
+Branch to `$Fail`.  Suppresses generation of dispatch of the next
+instruction (`-no_next`).  Typical use:
+
+    jump(Fail) {
+        $JUMP($Fail);
+    }
+
+##### GC_TEST(NeedStack, NeedHeap, Live) #####
+
+`$GC_TEST(NeedStack, NeedHeap, Live)` tests that given amount of
+stack space and heap space is available.  If not it will do a
+garbage collection.  Typical use:
+
+    test_heap(Nh, Live) {
+        $GC_TEST(0, $Nh, $Live);
+    }
+
+##### AH(NeedStack, NeedHeap, Live) #####
+
+`AH(NeedStack, NeedHeap, Live)` allocates a stack frame and
+optionally additional heap space.
+
+#### Pre-defined macros and variables ####
+
+**beam\_makeops** defines several built-in macros and pre-bound variables.
+
+##### The NEXT_INSTRUCTION pre-bound variable #####
+
+The NEXT_INSTRUCTION is a pre-bound variable that is available in
+all instructions.  It expands to the address of the next instruction.
+
+Here is an example:
+
+    i_call(CallDest) {
+        SET_CP(c_p, $NEXT_INSTRUCTION);
+        $DISPATCH_REL($CallDest);
+    }
+
+When calling a function, the return address is first stored in `c_p->cp`
+(using the `SET_CP()` macro defined in `beam_emu.c`), and then control is
+transferred to the callee.  Here is the generated code:
+
+    OpCase(i_call_f):
+    {
+      SET_CP(c_p, I+1);
+      ASSERT(VALID_INSTR(*(I + (fb(BeamExtraData(I[0]))) + 0)));
+      I += fb(BeamExtraData(I[0])) + 0;;
+      DTRACE_LOCAL_CALL(c_p, erts_code_to_codemfa(I));
+      Dispatch();;
+    }
+
+We can see that that `$NEXT_INSTRUCTION` has been expanded to `I+1`.
+That makes sense since the size of the `i_call_f/1` instruction is
+one word.
+
+##### The IP_ADJUSTMENT pre-bound variable #####
+
+`$IP_ADJUSTMENT` is usually 0.  In a few combined instructions
+(described below) it can be non-zero.  It is used like this
+in `macros.tab`:
+
+    SET_I_REL(Offset) {
+        ASSERT(VALID_INSTR(*(I + ($Offset) + $IP_ADJUSTMENT)));
+        I += $Offset + $IP_ADJUSTMENT;
+    }
+
+Avoid using `IP_ADJUSTMENT` directly.  Use `SET_I_REL()` or
+one of the macros that invoke such as `FAIL()` or `JUMP()`
+defined in `macros.tab`.
+
+#### Pre-defined macro functions ####
+
+##### The IF() macro #####
+
+`$IF(Expr, IfTrue, IfFalse)` evaluates `Expr`, which must be a valid
+Perl expression (which for simple numeric expressions have the same
+syntax as C).  If `Expr` evaluates to 0, the entire `IF()` expression will be
+replaced with `IfFalse`, otherwise it will be replaced with `IfTrue`.
+
+See the description of `OPERAND_POSITION()` for an example.
+
+##### The OPERAND\_POSITION() macro #####
+
+`$OPERAND_POSITION(Expr)` returns the position for `Expr`, if
+`Expr` is an operand that is not packed.  The first operand is
+at position 1.
+
+Returns 0 otherwise.
+
+This macro could be used like this in order to share code:
+
+    FAIL(Fail) {
+        //| -no_prefetch
+        $IF($OPERAND_POSITION($Fail) == 1 && $IP_ADJUSTMENT == 0,
+            goto common_jump,
+            $DO_JUMP($Fail));
+    }
+
+    DO_JUMP(Fail) {
+        $SET_I_REL($Fail);
+        Goto(*I));
+    }
+
+    // In beam_emu.c:
+    common_jump:
+       I += I[1];
+       Goto(*I));
+
+
+#### The $REFRESH\_GEN\_DEST() macro ####
+
+When a specific instruction has a `d` operand, early during execution
+of the instruction, a pointer will be initialized to point to the X or
+Y register in question.
+
+If there is a garbage collection before the result is stored,
+the stack will move and if the `d` operand refered to a Y
+register, the pointer will no longer be valid.  (Y registers are
+stored on the stack.)
+
+In those circumstances, `$REFRESH_GEN_DEST()` must be invoked
+to set up the pointer again.  **beam\_makeops** will notice
+if there is a call to a function that does a garbage collection and
+`$REFRESH_GEN_DEST()` is not called.
+
+Here is a complete example.  The `new_map` instruction is defined
+like this:
+
+    new_map d t I
+
+It is implemented like this:
+
+    new_map(Dst, Live, N) {
+        Eterm res;
+
+        HEAVY_SWAPOUT;
+        res = erts_gc_new_map(c_p, reg, $Live, $N, $NEXT_INSTRUCTION);
+        HEAVY_SWAPIN;
+        $REFRESH_GEN_DEST();
+        $Dst = res;
+        $NEXT($NEXT_INSTRUCTION+$N);
+    }
+
+If we have forgotten the `$REFRESH_GEN_DEST()` there would be a message
+similar to this:
+
+    pointer to destination register is invalid after GC -- use $REFRESH_GEN_DEST()
+    ... from the body of new_map at beam/map_instrs.tab(30)
+
+#### Combined instructions ####
+
+**Problem**: For frequently executed instructions we want to use
+"fast" operands types such as `x` and `y`, as opposed to `s` or `S`.
+To avoid an explosion in code size, we want to share most of the
+implementation between the instructions.  Here are the specific
+instructions for `i_increment/5`:
+
+    i_increment r W t d
+    i_increment x W t d
+    i_increment y W t d
+
+The `i_increment` instruction is implemented like this:
+
+    i_increment(Source, IncrementVal, Live, Dst) {
+        Eterm increment_reg_source = $Source;
+        Eterm increment_val = $IncrementVal;
+        Uint live;
+        Eterm result;
+
+        if (ERTS_LIKELY(is_small(increment_reg_val))) {
+            Sint i = signed_val(increment_reg_val) + increment_val;
+            if (ERTS_LIKELY(IS_SSMALL(i))) {
+                $Dst = make_small(i);
+                $NEXT0();
+            }
+        }
+        live = $Live;
+        HEAVY_SWAPOUT;
+        reg[live] = increment_reg_val;
+        reg[live+1] = make_small(increment_val);
+        result = erts_gc_mixed_plus(c_p, reg, live);
+        HEAVY_SWAPIN;
+        ERTS_HOLE_CHECK(c_p);
+        if (ERTS_LIKELY(is_value(result))) {
+            $REFRESH_GEN_DEST();
+            $Dst = result;
+            $NEXT0();
+        }
+        ASSERT(c_p->freason != BADMATCH || is_value(c_p->fvalue));
+        goto find_func_info;
+    }
+
+There will be three almost identical copies of the code.  Given the
+size of the code, that could be too high cost to pay.
+
+To avoid the three copies of the code, we could use only one specific
+instruction:
+
+    i_increment S W t d
+
+(The same implementation as above will work.)
+
+That reduces the code size, but is slower because `S` means that
+there will be extra code to test whether the operand refers to an X
+register or a Y register.
+
+**Solution**: We can use "combined instructions".  Combined
+instructions are combined from instruction fragments.  The
+bulk of the code can be shared.
+
+Here we will show how `i_increment` can be implemented as a combined
+instruction.  We will show each individual fragment first, and then
+show how to connect them together.  First we will need a variable that
+we can store the value fetched from the register in:
+
+    increment.head() {
+        Eterm increment_reg_val;
+    }
+
+The name `increment` is the name of the group that the fragment
+belongs to.  Note that it does not need to have the same
+name as the instruction.  The group name is followed by `.` and
+the name of the fragment.  The name `head` is pre-defined.
+The code in it will be placed at the beginning of a block, so
+that all fragments in the group can access it.
+
+Next we define the fragment that will pick up the value from the
+register from the first operand:
+
+    increment.fetch(Src) {
+        increment_reg_val = $Src;
+    }
+
+We call this fragment `fetch`.  This fragment will be duplicated three
+times, one for each value of the first operand (`r`, `x`, and `y`).
+
+Next we define the main part of the code that do the actual incrementing.
+
+    increment.execute(IncrementVal, Live, Dst) {
+        Eterm increment_val = $IncrementVal;
+        Uint live;
+        Eterm result;
+
+        if (ERTS_LIKELY(is_small(increment_reg_val))) {
+            Sint i = signed_val(increment_reg_val) + increment_val;
+            if (ERTS_LIKELY(IS_SSMALL(i))) {
+                $Dst = make_small(i);
+                $NEXT0();
+            }
+        }
+        live = $Live;
+        HEAVY_SWAPOUT;
+        reg[live] = increment_reg_val;
+        reg[live+1] = make_small(increment_val);
+        result = erts_gc_mixed_plus(c_p, reg, live);
+        HEAVY_SWAPIN;
+        ERTS_HOLE_CHECK(c_p);
+        if (ERTS_LIKELY(is_value(result))) {
+            $REFRESH_GEN_DEST();
+            $Dst = result;
+            $NEXT0();
+        }
+        ASSERT(c_p->freason != BADMATCH || is_value(c_p->fvalue));
+        goto find_func_info;
+    }
+
+We call this fragment `execute`.  It will handle the three remaining
+operands (`W t d`).  There will only be one copy of this fragment.
+
+Now that we have defined the fragments, we need to inform
+**beam\_makeops** how they should be connected:
+
+    i_increment := increment.fetch.execute;
+
+To the left of the `:=` is the name of the specific instruction that
+should be implemented by the fragments, in this case `i_increment`.
+To the right of `:=` is the name of the group with the fragments,
+followed by a `.`.  Then the name of the fragments in the group are
+listed in the order they should be executed.  Note that the `head`
+fragment is not listed.
+
+The line ends in `;` (to avoid messing up the indentation in Emacs).
+
+(Note that in practice the `:=` line is usually placed before the
+fragments.)
+
+The generated code looks like this:
+
+    {
+      Eterm increment_reg_val;
+      OpCase(i_increment_rWtd):
+      {
+        increment_reg_val = r(0);
+      }
+      goto increment__execute;
+
+      OpCase(i_increment_xWtd):
+      {
+        increment_reg_val = xb(BeamExtraData(I[0]));
+      }
+      goto increment__execute;
+
+      OpCase(i_increment_yWtd):
+      {
+        increment_reg_val = yb(BeamExtraData(I[0]));
+      }
+      goto increment__execute;
+
+      increment__execute:
+      {
+        // Here follows the code from increment.execute()
+        .
+        .
+        .
+    }
+
+##### Some notes about combined instructions #####
+
+The operands that are different must be at
+the beginning of the instruction.  All operands in the last
+fragment must have the same operands in all variants of
+the specific instruction.
+
+As an example, the following specific instructions cannot be
+implemented as a combined instruction:
+
+    i_times j? t x x d
+    i_times j? t x y d
+    i_times j? t s s d
+
+We would have to change the order of the operands so that the
+two operands that are different are placed first:
+
+    i_times x x j? t d
+    i_times x y j? t d
+    i_times s s j? t d
+
+We can then define:
+
+    i_times := times.fetch.execute;
+
+    times.head {
+        Eterm op1, op2;
+    }
+
+    times.fetch(Src1, Src2) {
+        op1 = $Src1;
+        op2 = $Src2;
+    }
+
+    times.execute(Fail, Live, Dst) {
+        // Multiply op1 and op2.
+        .
+        .
+        .
+    }
+
+Several instructions can share a group.  As an example, the following
+instructions have different names, but in the end they all create a
+binary.  The last two operands are common for all of them:
+
+    i_bs_init_fail       xy j? t? x
+    i_bs_init_fail_heap s I j? t? x
+    i_bs_init                W t? x
+    i_bs_init_heap         W I t? x
+
+The instructions are defined like this (formatted with extra
+spaces for clarity):
+
+    i_bs_init_fail_heap := bs_init . fail_heap . verify . execute;
+    i_bs_init_fail      := bs_init . fail      . verify . execute;
+    i_bs_init           := bs_init .           .  plain . execute;
+    i_bs_init_heap      := bs_init .               heap . execute;
+
+Note that the first two instruction have three fragments, while the
+other two only have two fragments.  Here are the fragments:
+
+    bs_init_bits.head() {
+        Eterm num_bits_term;
+        Uint num_bits;
+        Uint alloc;
+    }
+
+    bs_init_bits.plain(NumBits) {
+        num_bits = $NumBits;
+        alloc = 0;
+    }
+
+    bs_init_bits.heap(NumBits, Alloc) {
+        num_bits = $NumBits;
+        alloc = $Alloc;
+    }
+
+    bs_init_bits.fail(NumBitsTerm) {
+        num_bits_term = $NumBitsTerm;
+        alloc = 0;
+    }
+
+    bs_init_bits.fail_heap(NumBitsTerm, Alloc) {
+        num_bits_term = $NumBitsTerm;
+        alloc = $Alloc;
+    }
+
+    bs_init_bits.verify(Fail) {
+        // Verify the num_bits_term, fail using $FAIL
+        // if there is a problem.
+	.
+	.
+	.
+    }
+
+    bs_init_bits.execute(Live, Dst) {
+       // Long complicated code to a create a binary.
+       .
+       .
+       .
+    }
+
+The full definitions of those instructions can be found in `bs_instrs.tab`.
+The generated code can be found in `beam_warm.h`.
-- 
cgit v1.2.3