%CopyrightBegin%
 %CopyrightEnd%

$Id$

HiPE ARM ABI
================
This document describes aspects of HiPE's runtime system
that are specific for the ARM architecture.

Register Usage
--------------
r13 is reserved for the C runtime system.
XXX: r10 should be reserved too if stack checking is enabled

r9-r11 and r15 are fixed (unallocatable).
r9 (HP) is the current process' heap pointer.
r10 (NSP) is the current process' native stack pointer.
r11 (P) is the current process' "Process" pointer.
r15 (pc) is the program counter.

r0-r8, r12, and r14 (lr) are caller-save. They are used as temporary
scratch registers and for function call parameters and results.

The runtime system uses temporaries in specific contexts:
r8 (TEMP_LR) is used to preserve lr around BIF calls,
and to pass the callee address in native-to-BEAM traps.
r7 (TEMP_ARG0) is used to preserve the return value in nbif_stack_trap_ra,
and lr in hipe_arm_inc_stack (the caller saved its lr in TEMP_LR).
r1 (ARG0) is used for MBUF-after-BIF checks, for storing the
arity if a BIF that throws an exception or does GC due to MBUF,
and for checking P->flags for pending timeout.
r0 is used to inspect the type of a thrown exception, return a
result token from glue.S back to hipe_mode_switch(), and to pass
the callee arity in native-to-BEAM traps.

Calling Convention
------------------
The first NR_ARG_REGS parameters (a tunable parameter between 0 and 6,
inclusive) are passed in r1-r6.

r0 is not used for parameter passing. This allows the BIF wrappers to
simply move P to r0 without shifting the remaining parameter registers.

r12 is not used for parameter passing since it may be modified
during function linkage.

r14 contains the return address during function calls.

The return value from a function is placed in r0.

Notes:
- We could pass more parameters in r7, r8, r0, and r12. However:
  * distant call and trap-to-BEAM trampolines may need scratch registers
  * using >6 argument registers complicates the mode-switch interface
    (needs hacks and special-case optimisations)
  * it is questionable whether using more than 6 improves performance;
    it may be better to just cache more P state in registers

Stack Frame Layout
------------------
[From top to bottom: formals in left-to-right order, incoming return
address, fixed-size chunk for locals & spills, variable-size area
for actuals, outgoing return address. NSP normally points at the
bottom of the fixed-size chunk, except during a recursive call.
The callee pops the actuals, so no NSP adjustment at return.]

Stack Descriptors
-----------------
sdesc_fsize() is the frame size excluding the return address word.

Standard Linux ARM Calling Conventions
======================================

Reg		Status		Role
---		------		----
r0-r3		calleR-save	Argument/result/scratch registers.
r4-r8		calleE-save	Local variables.
r9		calleE-save	PIC base if PIC and stack checking are both enabled.
				Otherwise a local variable.
r10		calleE-save	(sl) Stack limit (fixed) if stack checking is enabled.
				PIC base if PIC is enabled and stack checking is not.
				Otherwise a local variable.
r11		calleE-save	(fp) Local variable or frame pointer.
r12		calleR-save	(ip) Scratch register, may be modified during
				function linkage.
r13		calleE-save	(sp) Stack pointer (fixed). Must be 4-byte aligned
				at all times. Must be 8-byte aligned during transfers
				to/from functions.
r14		calleR-save	(lr) Link register or scratch variable.
r15		fixed		(pc) Program counter.

The stack grows from high to low addresses.
Excess parameters are stored on the stack, at SP+0 and up.