This is work in progress
- The ELF Format
- Position-Independent Code/Executable (PIC/PIE)
- OS Kernel loading an executable
- dynamic linking::an overview
- dynamic linking::relocation and loading shared objects
- dynamic linking::structures
- dynamic linking::functions
- dynamic linking::resolving symbols at runtime
- symbol table: get a taste
- Beyond ELF: a.out format
- Beyond ELF: Object oriented, overloading and dispatch tables …
- other readings:
NOTE: the term function, (sub)routine and procedure are used interchangeably
DISCLAIMER: TEXT BASED OF AND/OR HEAVILY COPIED FROM SEVERAL ARTICLES 1 2 which I hold no copyright of. This post is just me parsing and understanding the materials I read and serves only as my own note. This post is shared under CC BY-SA 4.0.
# The ELF Format
Some details here, see also the TIS specs3 for completeness.
Basically, the ELF format contains
-
a magic number (included the file header) identifying the file format, hex
7f 45 4c 46
or ascii.ELF
-
an elf file header at the beginning of the file, describing the binary file itself, e.g. architecture, OS ABI, type (REL/EXEC/DYN …), entry point, offset of other parts (prog header, sec header…).
[+] click to expand elf file header example
$ readelf -h /usr/bin/ls ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x5130 Start of program headers: 64 (bytes into file) Start of section headers: 136128 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 28 Section header string table index: 27
-
a program header table specifying how the program should be loaded. Important are:
PT_LOAD
: describes segments of the program (to be loaded into memory)PT_INTERP
: if present, indentifies the needed dynamic linker.GNU_STACK
: if present, indicates whether the program’s stack should be executable.
Note that multiple sections can be combined into one segment if they share the same properties.
[+] click to expand example of program header table
$ readelf -l /usr/bin/ls Elf file type is DYN (Position-Independent Executable file) Entry point 0x5130 There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x00000000000002d8 0x00000000000002d8 R 0x8 INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000000021a8 0x00000000000021a8 R 0x1000 LOAD 0x0000000000003000 0x0000000000003000 0x0000000000003000 0x0000000000014931 0x0000000000014931 R E 0x1000 LOAD 0x0000000000018000 0x0000000000018000 0x0000000000018000 0x0000000000007058 0x0000000000007058 R 0x1000 LOAD 0x000000000001ff10 0x0000000000020f10 0x0000000000020f10 0x0000000000001368 0x0000000000002630 RW 0x1000 DYNAMIC 0x0000000000020a50 0x0000000000021a50 0x0000000000021a50 0x00000000000001f0 0x00000000000001f0 RW 0x8 NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338 0x0000000000000050 0x0000000000000050 R 0x8 NOTE 0x0000000000000388 0x0000000000000388 0x0000000000000388 0x0000000000000044 0x0000000000000044 R 0x4 GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338 0x0000000000000050 0x0000000000000050 R 0x8 GNU_EH_FRAME 0x000000000001d190 0x000000000001d190 0x000000000001d190 0x000000000000059c 0x000000000000059c R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x000000000001ff10 0x0000000000020f10 0x0000000000020f10 0x00000000000010f0 0x00000000000010f0 R 0x1 ``` </details>
-
informations used for dynamic linking such as plt, got, symbol tables, will be discussed later.
There are other optional parts like section header table, but they are not mandatory for running an ELF.
# Position-Independent Code/Executable (PIC/PIE)
Code that executes properly regardless of its memory address. (can be executed at any memory address without modification) PIC is commonly used for shared libraries, so that the same library code can be loaded at arbitrary address in program’s address space 4.
# OS Kernel loading an executable
Which file type?
- starts with
#! <interapter>
: a script, invoke the interpreter (like python) and feed the rest of the file to it. - starts with
\x7fELF
: executable (in case native) PT_INTERP
in the program header indicates that the program is dynamically linked, and specifies which dynamic linker to use. e.g.INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
kernel also checks for binfmt_misc settings that allows users to register custom program interpreters; Often this is for running non-native binaries with e.g. qemu.
# Loading a (static) ELF binary
- clear up previous program (i.e. caller of
exec()
) state; kill other threads of old program, clear old pending timers, update the exe file (/proc/pid/exe
), release old virtual memory mappings, kill pending async I/Os, frees uprobes, update personality, etc. - set up virtual memory for new program, loops through
PT_LOAD
segments and maps them into process’s address space; sets up zero-filed pages (e.g. for.BSS
), map special pages e.g.vDSO
; set up credentials (Linux Security Module, I don’t know this). - set up the stack (auxv, env, argv …… not gonna discuss here)
- switching to new program userspace (via pushing a new
pt_regs
frame into the kernel stack.
# dynamic linking::an overview
Spoken: the code uses extern
symbols (functions or variables) that are NOT
known at link time, i.e. using shared libraries. When the program is executed,
the OS first invoke the dynamic linker, which then figure locates the shared
libraries, load them and resolves the was-unknown symbol addresses.
The big picture would look like this:
- the program code are loaded into memory the same way.
- the kernel sets up initial address space for the dynamic linker and loads it the same way as a static ELF binary.
- instead of executing from the program entry point, the control is passed to the interpreter, it also gets a file descriptor of the to-be-executed elf binary.
- The dynamic linker (interpreter) figures out the linkage and resolves the symbols.
- the new program executes.
NOTES
- the dynamic linker itself CANNOT be dynamically linked.
# dynamic linking::relocation and loading shared objects
Definition: Position-Independent executables contain a list of relocations, specific places in the binary that the dynamic linker needs to patch with actual address of various components in memory. This mostly happens to the GOT (see below)1.
- The first major task of the dynamic linker is to read its own program header and apply relocations to itself. (?? needs more reading).
- set up the link map: used by
dlinfo()
- figure out how to deal with vDSO (also a shared object) and place vDSO in the link map.
- (optionally) user can override functions at run time by specifying shared
objects in the
LD_PRELOAD
env variable. - the linker looks for
DT_NEEDED
declarations recursively, listing all needed shared objects. - for each shared object, the linker resolves and opens the file, load it into newly allocated (+ASLR) location in the address space; performs relocations (excluding the linker itself) in the shared object’s header; adds the shared object to the link map.
- in the end the linker applies relocations to itself one final time, so that
now it will refer to the overridden version (
PT_PRELOAD
) of any functions it uses. - set up thread local storage (TLS) and perform initialization required by C library.
- proceed to the actual program
[+] expand example: read the dynamic relocation records
$ objdump -R /usr/bin/ls
/usr/bin/ls: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
0000000000021c58 R_X86_64_GLOB_DAT __ctype_toupper_loc@GLIBC_2.3
0000000000021c60 R_X86_64_GLOB_DAT getenv@GLIBC_2.2.5
0000000000021c68 R_X86_64_GLOB_DAT cap_to_text@Base
0000000000021c70 R_X86_64_GLOB_DAT sigprocmask@GLIBC_2.2.5
[SNIP]
# dynamic linking::structures
sample code:
|
|
# Global Offset Table (GOT) - how to find stuffs that find stuffs
https://en.wikipedia.org/wiki/Global_Offset_Table
https://maskray.me/blog/2021-08-29-all-about-global-offset-table
GOT contains pointer to global variables or loaded sections and link-time constants.
.got.plt
holds symbol addresses used by PLT entries..got
holds everything else
$ readelf -S main
[22] .got PROGBITS 0000000000003fc0 00002fc0
0000000000000028 0000000000000008 WA 0 0 8
[23] .got.plt PROGBITS 0000000000003fe8 00002fe8
0000000000000020 0000000000000008 WA 0 0 8
and the global offset table looks like this:
#
Procedure Linkage Table (PLT) - how to actually get extern
stuffs
To call a extern
function e.g. puts()
, which is called by printf()
$ readelf -D main
0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 8d 05 c0 0e 00 00 lea 0xec0(%rip),%rax
1144: 48 89 c7 mov %rax,%rdi
1147: e8 e4 fe ff ff call 1030 <puts@plt>
114c: b8 00 00 00 00 mov $0x0,%eax
1151: 5d pop %rbp
1152: c3 ret
and the .plt
section looks like this - the puts@plt
is just another simple
jump, how does it leads to the correct library function?
Disassembly of section .plt:
0000000000001030 <puts@plt>:
1030: ff 25 ca 2f 00 00 jmp *0x2fca(%rip) # 4000 <puts@GLIBC_2.2.5>
1036: 68 00 00 00 00 push $0x0
103b: e9 e0 ff ff ff jmp 1020 <_init+0x20>
[SNIP]
Instead of calling into puts()
whose address is unknown, 0x1147
calls a
function snip in the puts@plt
in the PLT. The dynamic linker updates the PLT
during the lifetime of the loaded program..
PLT is a section that contains an an indirection for each externally defined
function.
.got.plt
is a secondary indirection on top of PLT: in the example above
puts@plt
actually jumps to got.plt
, which is patched (relocation) to jump to
the actual shared object. It’s also possible to directly use relocation in PLT
but less often.
INSIGHT: it’s slow to simply use relocations to directly patch CALL
instructions 1
Firstly, since the number of relocations required would depend on the number of calls to the given function (which may be large), the initial application of those relocations to a shared object can be slow. Secondly, since text relocations involve dirtying the pages of memory containing a program’s executable code, different processes running the same program can no longer share the same underlying memory, increasing the memory usage of the program.
INSIGHT: a secondary indirection (.got.plt
over .plt
) enables lazy linking:
the GOT entry is not patched until the corresponding plt
trampoline is called.
#
PT_DYNAMIC
program header
DT_NEEDED
declarations in the PT_DYNAMIC
segment indicates that the program
depends on another shared object.
[+] expand full example of dynamic section
$ readelf -d /usr/bin/ls
Dynamic section at offset 0x20a50 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libcap.so.2]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x3000
0x000000000000000d (FINI) 0x17924
0x0000000000000019 (INIT_ARRAY) 0x20f10
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x20f18
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3d0
0x0000000000000005 (STRTAB) 0xf20
0x0000000000000006 (SYMTAB) 0x3f8
0x000000000000000a (STRSZ) 1439 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000007 (RELA) 0x1690
0x0000000000000008 (RELASZ) 2760 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000000000001e (FLAGS) BIND_NOW
0x000000006ffffffb (FLAGS_1) Flags: NOW PIE
0x000000006ffffffe (VERNEED) 0x15b0
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x14c0
0x0000000000000024 (RELR) 0x2158
0x0000000000000023 (RELRSZ) 80 (bytes)
0x0000000000000025 (RELRENT) 8 (bytes)
0x0000000000000000 (NULL) 0x0
https://docs.oracle.com/cd/E19683-01/817-3677/chapter6-42444/index.html
# the link map
# dynamic linking::functions
dlopen()
dlinfo()
# dynamic linking::resolving symbols at runtime
Procedure Linkage Table, used to indirectly call extern functions.
# symbol table: get a taste
Example code:
// test.c
#include <stdio.h>
extern char global_c;
int test(){
printf("%d\n",global_c);
}
// ext.c
const char global_c = 42;
Build and readelf
# compile and read symtable of each object, -c flag builds linkable objec
$ gcc -c -static *.c
$ readelf -s *.o
# link them all (main.c simply calls test() as an extern function)
$ gcc -o main main.c test.o ext.o
[+] expand outputs: symtab of test.o and ext.o
(test.o) Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 54 FUNC GLOBAL DEFAULT 1 test
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND global_c
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
(ext.o) Symbol table '.symtab' contains 3 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS ext.c
2: 0000000000000000 1 OBJECT GLOBAL DEFAULT 4 global_c
[+] expand outputs: symtab of the combined binary
Symbol table '.dynsym' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@GLIBC_2.34 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (3)
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (3)
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
6: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
7: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (3)
Symbol table '.symtab' contains 29 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
2: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS ext.c
4: 0000000000000000 0 FILE LOCAL DEFAULT ABS
5: 0000000000003de0 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
6: 0000000000002014 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR
... ... other stuffs from libc....
19: 0000000000002012 1 OBJECT GLOBAL DEFAULT 16 global_c
20: 0000000000004028 0 NOTYPE GLOBAL DEFAULT 25 _end
21: 0000000000001050 38 FUNC GLOBAL DEFAULT 14 _start
22: 0000000000004020 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
23: 0000000000001149 21 FUNC GLOBAL DEFAULT 14 main
27: 0000000000001000 0 FUNC GLOBAL HIDDEN 12 _init
28: 000000000000115e 54 FUNC GLOBAL DEFAULT 14 test
# Beyond ELF: a.out format
# Beyond ELF: Object oriented, overloading and dispatch tables …
# other readings:
- https://www.redhat.com/en/blog/hardening-elf-binaries-using-relocation-read-only-relro
- https://reverseengineering.stackexchange.com/questions/1992/what-is-plt-got
- https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html
- https://maskray.me/blog/2021-08-29-all-about-global-offset-table
-
A look at dynamic linking, Daroc Alden, LWN, https://lwn.net/Articles/961117/ ↩︎ ↩︎ ↩︎
-
How programs get run: ELF binaries, David Drysdale, LWN, https://lwn.net/Articles/631631/ ↩︎
-
Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification, TIS Commitee. ↩︎
[+] click to leave a comment [+]
>> SEND COMMENT <<