SysV ABI (AMD64)
notes, based of
System V Application Binary Interface AMD64 Architecture Processor Supplement (With LP64 and ILP32 Programming Models), Version 1.0, May 23, 2023, H.J. Lu et al.
Numbers such as Sec. 3.1.2
refer to the chapters in the above specification.
Important places:
- Table 3.1 - Micro-Arch. Levels (ISA features wrt. CPUID)
- Figure 3.1 - Scalar (C) Types, sizeof, alignment and AMD64 arch. fundamental types
- Figure 3.3 - Stack Frame structure w. Base pointer (%rbp)
- Figure 3.4 - Register usage.
- Table 3.2 - Hardware exception and signals
- Figure 3.9 - Initial Process Stack
- Figure 3.10 and 3.11 - Aux vector for process initialization
TABLE OF CONTENT
- Architecture
- Data models
- Registers (Sec 3.2.1)
- Functions, Stack Frame, Parameter passing (Sec 3.2.2 - 3.2.3)
- OS Interface (Sec. 3.3)
- Code Models {small, kernel, medium, large, PIC etc.}
- GOT, PLT and PIC
Architecture
AMD64 is an extension of the x86 arch. Unless otherwise stated AMD64 ABI follows conventions described in Intel386 ABI. (i386 + AMD64 = x86_64)
Data models
https://www.ibm.com/docs/en/zos/2.4.0?topic=options-lp64-ilp32
Data types & sizes
C Type | ILP32 | LP64 |
---|---|---|
char | 8 | 8 |
short | 16 | 16 |
int | 32 | 32 |
long | 32 | 64 |
long long | 64 | 64 |
pointer | 32 | 64 |
size_t | 32 | 64 |
I means “int”, L means “long”, P means “pointer”.
“shorthand” names
AMD64 | i386 (and aarch64) | also i386, most IA-32 docs (fuck it) | Size (nBytes) |
---|---|---|---|
byte | byte | byte | 1 |
twobyte | halfword | word | 2 |
fourbyte | word | doubleword | 4 |
eightbyte | doubleword | quadword | 8 |
sixteenbyte | quaword(?) | double quadword | 16 |
Addressing mode
LP64 is AMODE64 and ILP32 is AMODE31. In ILP31 only 31 bits within the pointer
are taken to form the address.
Registers (Sec 3.2.1)
- BASE: 16 X 64-bits GPR
- SSE: 16 x 128-bits SSE regs (
%xmm0 - %xmm15
) - x87: 8 x x87 10-bits FP regs
- AVX (Advanced Vector Extensions): 16 x 256-bits AVX regs :
%ymm0 - %ymm15
, the lower 128-bits of%ymm0 - %ymm15
are alised to the respective 128-bits SSE regs (%xmm0 - %xmm31
) - AVX-512: 32 x 612-bits SIMD regs (
%zmm0 - %zmm31
), the lower 128 and 256 bits are alised to respective SSE regs%xmm0 - %xmm31
and AVX regs%ymm0 - %ymm31
- AVX-512: 8 x 64-bits vector mask registers
%k0 - %k7
- “Vector Register”: refer to either SSE, AVX or AVX-512 regs.
for paramater passing and function return, %xmmN
, %ymmN
and %zmmN
refer to
the same register; only one of them can be used at the same time. (Sec. 3.2.1)
Functions, Stack Frame, Parameter passing (Sec 3.2.2 - 3.2.3)
- callee-saved registers:
%rbp, %rbp, %r12 - %r15
: the called function must preserve these. - caller-saved registers: the others: a calling function must save them in its local stack frame, otherwise the values may be lost during a subroutine.
- the stack needs to be 16 (32 or 64, if
__m256
or__m512
is passed on stack) byte boundary before thecall
instruction is executed. - “red zone”: the 128-byte area beyond location poinsted to by
%rsp
is considered to be reserved and shall not be modified by signal or interrupt handlers. Functions may use this area for temporary data that is not needed across function calls.
Position Contents
+--------------------------------------------+---------------+ HIGH ADDRESS
|8n+16 (%rbp) Memory argument eightbyte n | |
|... ... | previous frame|
|16 (%rbp) Memory argument eightbyte 0 | |
+--------------------------------------------+---------------+
|8 (%rbp) return address | |
+--------------------------------------------+ |
|0 (%rbp) previous %rbp value | | <- current rbp
+--------------------------------------------+ |
|-8 (%rbp) unspecified (variable size) | current frame |
| ... local variables etc. | |
|0 (%rsp) ... | | <- current rsp
+--------------------------------------------+---------------+
|-128 (%rsp) RED ZONE | |
+--------------------------------------------+---------------+ LOW ADDRESS
Argument Classes and register classes
- INTEGER: -> one of the general purpose registers in the order of
rdi, rsi, rdx, rcx, r8, r9
(up to 6) - SSE: -> a vector register, in the order of
xmm0
toxmm7
- SSEUP -> vector register upper bytes
- X87, X87UP, XOMPLEX_X87 -> x87 FPU, passed in memory
- NO_CLASS -> used for padding/empty structs/unions
- MEMORY -> passed and returned in memory via the stack
Fitting C Types into classes for parameter passing:
{_Bool, char, short, int, long, long long, ptr*}
:INTEGER class{_Float16, float, double, Decimal32, _Decimal64, __m64}
: SSE class{__float126, _Decimal128, __m128}
: split into two halves, LSBs -> SSE class; MSBs -> SSEUP class.{long double}
: 64-bit mantissa -> X87; 16-bit exponent + 6bytes padding -> X87UP.{__m256, __m512}
: split into more 8byte chunks, low 1 in SSE, high 3/7 in SSEUP.__int128
: uses 2 x GPRs or 16-bte aligned in memory- MISC: (see 3.2.3):
{_BitInt(N), complex T}
Fitting aggregated (structures / arrays) and uinon types
- if object size > 8 x eightbytes (64 bytes) or it contains unaligned fields, it has chass MEMORY
- C++ object that is non-trivial for the purpose of calls: passed by invisible reference.(the object is replaced in the parameter list by a pointer that has class INTEGER)
- MISC: classification algorithm: deriving, recursive, merger cleanups, see 3.2.3
passing arguments If there are no registers available for any eightbyte of an argument, the whole argument is passed on the stack.
For varargs or stdargs (prototype-less calls or calls to functions containing
elipsis (…), %al
is used as hidden argument to specify the number of vector
registers used.
Returning of values depending on the return type w. classification algorithm (Sec 3.2.3)
- MEMORY: the caller provides space for the return value and passes the address of this storage in %rdi (which is, the 1st argument). This storage must not overlap any data visible to the callee through other names than this argument.
- INTEGER: the next available register of order
%rax, %rdx
is used. - SSE: the next available register of order
xmm0, xmm1
is used. - SSEUP: the eightbyte is returned in the next available eightbyte chunk of the last used vector register. (??)
- X87: returned on the x87 stack in
st0
as 80-bit x87 number (??) - COMPLEX_X87: real part in
st0
, imag. part inst1
OS Interface (Sec. 3.3)
NOTE: should refer to other literatures (such as OS specific ones), here I’m only noting sysV specific stuffs.
Exceptions: {faults, traps, aborts} (i386 ABI, TODO).
x86 HW Exception / Interrupt numbers
Virtual Address Space
implementation are only required to handle 48-bit (virtual) addresses.
(0x00000000_00000000 to 0x00007fff_ffffffff
).
System can use any page size between 4KB and 64KB inclusive.
: I don’t think this is pratically true….because there are huge pages (like 1GB) and MMU can’t handle all granules
Process initialization (Sec 3.4)
- register state: SSE2, x87,
rFLAGS
… - stack state (
exec
): see Figure 3.9. - Thread FP State: new threads inherit the FP state of the parent thread and the state is private to the thread thereafter.
- AUX vector: Figure 3.10 and Figure 3.11
Code Models {small, kernel, medium, large, PIC etc.}
Sec 3.5.1
- For {S/L/M/L} code models, the VA of instructions and data are known at link time.
- For {S/L/M/L} PIC models, the VA of instructions and data are unknown until dynamic link time.
- ILP32 bins only uses small code model and small PIC model.
Small Code Model:
All symbols are known to be located in VA 0x0000_0000 to 0x7eff_ffff
. Symbolic
references can be encoded with imm operands + instruction relative addressing.
This is the fastest
Kernel Code Model:
uses only the “higher half” of the address space. 0xffff_ffff_8000_0000
to
0xffff_ffff_ff00_0000
Medium Code Model:
data section is split into two parts: {.data, .rodata, .bss etc}
and {.ldata, .lrodata, .lbss etc}
. The first parts are limited in the same way as in the
small code model. The large sections have no limits in avlable address space.
The large sections must come after text
and data
sections. Requires
moveabs
instrs to access large static data and to load addresses into regs.
Large Code Model: makes no assumption about address and sizes of sections.
The compiler is required to use moveabs
instr even for addresses inside text
(there are hacks though) section. Indirect branches(??) are needed when
branching to addresses whose offset from current instruction pointer is unknown.
{Small, Medium, Large} Position independent code (PIC) model: virtual addresses of instructions and data are not known until dynamic link time. All addresses have to be relative to the instruction pointer. The max distance between a
Small PIC: symbol and the end of an instruction is limited to 0x7eff_ffff
Medium PIC: like small PIC but adds large data section at the end of object
files. For large static data an unwind sequence of movabs
, lea
and add
needs to be used.
Large PIC: makes no assumption about distance of symbols. Same limitation of static data in Medium PIC appliies. Also, references to the GOT and to the procedure linkage table and branch destinations need to be calculated in a similar way. The size of text segment is allowed to be up to 16EB in size.
[+] click to leave a comment [+]
>> SEND COMMENT <<