aboutsummaryrefslogtreecommitdiffstats
path: root/erts/emulator/hipe/hipe_amd64_abi.txt
diff options
context:
space:
mode:
Diffstat (limited to 'erts/emulator/hipe/hipe_amd64_abi.txt')
-rw-r--r--erts/emulator/hipe/hipe_amd64_abi.txt150
1 files changed, 150 insertions, 0 deletions
diff --git a/erts/emulator/hipe/hipe_amd64_abi.txt b/erts/emulator/hipe/hipe_amd64_abi.txt
new file mode 100644
index 0000000000..27beff4ea2
--- /dev/null
+++ b/erts/emulator/hipe/hipe_amd64_abi.txt
@@ -0,0 +1,150 @@
+
+ %CopyrightBegin%
+ %CopyrightEnd%
+
+$Id$
+
+HiPE AMD64 ABI
+==============
+This document describes aspects of HiPE's runtime system
+that are specific for the AMD64 (x86-64) architecture.
+
+Register Usage
+--------------
+%rsp and %rbp are fixed and must be preserved by calls (callee-save).
+%rax, %rbx, %rcx, %rdx, %rsi, %rdi, %r8, %r9, %r10, %r11, %r12, %r13, %r14
+are clobbered by calls (caller-save).
+%r15 is a fixed global register (unallocatable).
+
+%rsp is the native code stack pointer, growing towards lower addresses.
+%rbp (aka P) is the current process' "Process*".
+%r15 (aka HP) is the current process' heap pointer. (If HP_IN_R15 is true.)
+
+Notes:
+- C/AMD64 16-byte aligns %rsp, presumably for SSE and signal handling.
+ HiPE/AMD64 does not need that, so our %rsp is only 8-byte aligned.
+- HiPE/x86 uses %esi for HP, but C/AMD64 uses %rsi for parameter passing,
+ so HiPE/AMD64 should not use %rsi for HP.
+- Using %r15 for HP requires a REX instruction prefix, but performing
+ 64-bit stores needs one anyway, so the only REX-prefix overhead
+ occurs when incrementing or copying HP [not true (we need REX for 64
+ bit add and mov too);�only overhead is when accessing floats on the
+ heap /Luna].
+- XXX: HiPE/x86 could just as easily use %ebx for HP. HiPE/AMD64 could use
+ %rbx, but the performance impact is probably minor. Try&measure?
+- XXX: Cache SP_LIMIT, HP_LIMIT, and FCALLS in registers? Try&measure.
+
+Calling Convention
+------------------
+Same as in the HiPE/x86 ABI, with the following adjustments:
+
+The first NR_ARG_REGS (a tunable parameter between 0 and 6, inclusive)
+parameters are passed in %rsi, %rdx, %rcx, %r8, %r9, and %rdi.
+
+The first return value from a function is placed in %rax, the second
+(if any) is placed in %rdx.
+
+Notes:
+- Currently, NR_ARG_REGS==0.
+- C BIFs expect P in C parameter register 1: %rdi. By making Erlang
+ parameter registers 1-5 coincide with C parameter registers 2-6,
+ our BIF wrappers can simply move P to %rdi without having to shift
+ the remaining parameter registers.
+- A few primop calls target C functions that do not take a P parameter.
+ For these, the code generator should have a "ccall" instruction which
+ passes parameters starting with %rdi instead of %rsi.
+- %rdi can still be used for Erlang parameter passing. The BIF wrappers
+ will push it to the C stack, but \emph{parameter \#6 would have been
+ pushed anyway}, so there is no additional overhead.
+- We could pass more parameters in %rax, %rbx, %r10, %r11, %r12, %r13,
+ and %r14. However:
+ * we may need a scratch register for distant call trampolines
+ * using >6 argument registers complicates the mode-switch interface
+ (needs hacks and special-case optimisations)
+ * it is questionable whether using more than 6 improves performance;
+ it may be better to just cache more P state in registers
+
+Instruction Encoding / Code Model
+---------------------------------
+AMD64 maintains x86's limit of <= 32 bits for PC-relative offsets
+in call and jmp instructions. HiPE/AMD64 handles this as follows:
+- The compiler emits ordinary call/jmp instructions for
+ recursive calls and tailcalls.
+- The runtime system code is loaded into the low 32 bits of the
+ address space. (C/AMD64 small or medium code model.) By using mmap()
+ with the MAP_32BIT flag when allocating memory for code, all
+ code will be in the low 32 bits of the address space, and hence
+ no trampolines will be necessary.
+
+When generating code for non-immediate literals (boxed objects in
+the constants pool), the code generator should use AMD64's new
+instruction for loading a 64-bit immediate into a register:
+mov reg,imm with a rex prefix.
+
+Notes:
+- The loader/linker could redirect a distant call (where the offset
+ does not fit in a 32-bit signed immediate) to a linker-generated
+ trampoline. However, managing trampolines requires changes in the
+ loaders and possibly also the object code format, since the trampoline
+ must be close to the call site, which implies that code and its
+ trampolines must be created as a unit. This is the better long-term
+ solution, not just for AMD64 but also for SPARC32 and PowerPC,
+ both of which have similar problems.
+- The constants pool could also be restricted to the low 32 bits of
+ the address space. However:
+ * We want to move away from a single constants pool. With multiple
+ areas, the address space restriction may be unrealistic.
+ * Creating the address of a literal is an infrequent operation, so
+ the performance impact of using 64-bit immediates should be minor.
+
+Stack Frame Layout
+Garbage Collection Interface
+BIFs
+Stacks and Unix Signal Handlers
+-------------------------------
+Same as in the HiPE/x86 ABI.
+
+
+Standard C/AMD64 Calling Conventions
+====================================
+See <http://www.x86-64.org/abi.pdf>.
+
+%rax, %rdx, %rcx, %rsi, %rdi, %r8, %r9, %r10, %r11 are clobbered by calls (caller-save)
+%rsp, %rbp, %rbx, %r12, %r13, %r14, %r15 are preserved by calls (callee-save)
+[note: %rsi and %rdi are calleR-save, not calleE-save as in the x86 ABI]
+%rsp is the stack pointer (fixed). It is required that ((%rsp+8) & 15) == 0
+when a function is entered. (Section 3.2.2 in the ABI document.)
+%rbp is optional frame pointer or local variable
+The first six integer parameters are passed in %rdi, %rsi, %rdx, %rcx, %r8, and %r9.
+Remaining integer parameters are pushed right-to-left on the stack.
+When calling a variadic function, %rax (%al actually) must contain an upper
+bound on the number of SSE parameter registers, 0-8 inclusive.
+%r10 is used for passing a function's static chain pointer.
+%r11 is available for PLT code when computing the target address.
+The first integer return value is put in %rax, the second (for __int128) in %rdx.
+A memory return value (exact definition is complicated, but basically "large struct"),
+is implemented as follows: the caller passes a pointer in %rdi as a hidden first
+parameter, the callee stores the result there and returns this pointer in %rax.
+The caller deallocates stacked parameters after return (addq $N, %rsp).
+
+Windows 64-bit C Calling Conventions
+====================================
+See "Calling Convention for x64 64-Bit Environments" in msdn.
+
+%rax, %rcx, %rdx, %r8, %r9, %r10, %r11 are clobbered by calls (caller-save).
+%rsp, %rbp, %rbx, %rsi, %rdi, %r12, %r13, %r14, %r15 are preserved
+by calls (callee-save).
+[Note: %rsi and %rdi are calleE-save not calleR-save as in the Linux/Solaris ABI]
+%rsp is the stack pointer (fixed). %rsp & 15 should be 0 at all times,
+except at the start of a function's prologue when ((%rsp+8) & 15) == 0.
+Leaf functions may leave (%rsp & 15) != 0.
+The first four integer parameters are passed in %rcx, %rdx, %r8, and %r9.
+Remaining integer parameters are pushed right-to-left on the stack,
+starting at the fifth slot above the caller's stack pointer.
+The bottom of the caller's frame must contain 4 slots where the callee
+can save the four integer parameter registers, even if fewer than 4
+parameters are passed in registers.
+An integer return value is put in %rax. Large integers (_m128), floats,
+and doubles are returned in %xmm0. Larger return values cause the caller
+to pass a pointer to a result buffer in %rcx as a hidden first parameter.
+The caller may deallocate stacked parameters after return (addq $N, %rsp).