explain/rtld.md
... ...
@@ -0,0 +1,87 @@
1
+## Dynamic linking
2
+
3
+Dynamic linking is the process of loading code and data from shared libraries.
4
+For an overview, see [this
5
+](https://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082) and [
6
+this page](https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488).
7
+For a series of very in-depth articles, see [this page
8
+](https://www.airs.com/blog/archives/38) etc.
9
+
10
+### The interesting bits
11
+
12
+Or at least, to us.
13
+
14
+The kernel recognises a dynamic ELF executable by the fact it has a `PT_INTERP`
15
+phdr which points to the dynamic linker. See [Process creation](/explain/proc)
16
+for more info. However, it turns out the dynamic linker is just a regular,
17
+statically linked ELF executable, and you can simply give it the program to run
18
+as an argument. When using a shell dropper, this technique spares out a few
19
+bytes.
20
+
21
+When the kernel loads our program and the dynamic linker (`ld.so`), it makes
22
+the process jump to the dynamic linker. On glibc, the linker's entrypoint is
23
+`_dl_start_user` ([i386
24
+](https://code.woboq.org/userspace/glibc/sysdeps/i386/dl-machine.h.html#153),
25
+[x86_64
26
+](https://code.woboq.org/userspace/glibc/sysdeps/x86_64/dl-machine.h.html#141)).
27
+It then loads the main program's `PT_DYNAMIC` phdr, which
28
+contains a key-value table which describes which libraries we depend on
29
+(`DT_NEEDED`), where the symbol and string tables are (`DT_SYMTAB`, `DT_STRTAB`
30
+), etc. The dynamic linker then loads these libraries, resolves the needed
31
+symbols, and performs the required relocations to glue all code together.
32
+
33
+However, there's one entry, `DT_DEBUG`, which isn't documented (the docs say
34
+"for debugging purposes"). What it actually does, is that the dynamic linker
35
+places a pointer to its `r_debug` struct in the value field. This behavior
36
+is mostly portable (as in, it works on at least glibc, musl and FreeBSD). If
37
+you look at your system's `link.h` file (eg. in `/usr/include`), you can see
38
+the contents of this struct. The second field is a pointer to the root
39
+`link_map`. More about this one later.
40
+
41
+After all the dependencies are loaded etc., the runtime linker will jump to our
42
+entrypoint (`_start`), without clearing the registers or cleaning up the stack.
43
+
44
+### The attack plan
45
+
46
+Normal dynamic linking needs a lot of (large) ELF header stuff. The trick is to
47
+bypass all this, and do the necessary minimal amount of linking ourselves.
48
+
49
+Now let's have a look at the `link_map`. (It's in the same `link.h` header.)
50
+It's a linked list containing information of every ELF file loaded by the rtld.
51
+(The first entry is our binary, the second is the [vDSO
52
+](https://lwn.net/Articles/446528/). Usually, the third one is `libc`.) The
53
+`link_map` contains the path, base address and a pointer to its `PT_DYNAMIC`
54
+phdr. We can use the latter to traverse all the symbol tables and figure out
55
+what the address of every symbol is. This is what **bold** and **dnload** do,
56
+except they save a hash of every symbol name, instead of the symbol names
57
+themselves.
58
+
59
+However, there are a few ways to save even more bytes:
60
+
61
+First of all, instead of using the `DT_DEBUG` trick, we can read some of the
62
+internal linker state that's leaked to our entry point. On `i386`, the
63
+`link_map` ends up in `eax` because `_dl_start_user` loaded it in there, and it
64
+is retained after a function call because of the calling convention.
65
+
66
+On `x86_64`, it's a bit more convolved: the register the `link_map` is stored
67
+in doesn't get preserved, but the return address to some place in
68
+`_dl_start_user` is placed on the stack when calling some internal rtld function.
69
+We can read this address, to which we add an offset (to the instruction that
70
+fetches the `link_map`, which is placed in a global variable) to decode the
71
+offset to the `link_map`, and voila.
72
+
73
+Secondly, instead of iterating over the symbol tables (and having to compute
74
+the hashes of every symbol), we can use the [internal `link_map` data in glibc
75
+](https://code.woboq.org/userspace/glibc/include/link.h.html#link_map) to access
76
+the calculated symbol hash tables, so we only have to do a hashtable lookup to
77
+get the address of a symbol.
78
+
79
+However, the offset to the hashtable in the `link_map` tends to vary between
80
+glibc versions. But, note the presence of the `l_entry` field somewhere down
81
+there. This value is known at (static) link time, we control it, so we can
82
+simply scan the struct for the value, and use the difference between the
83
+address of the `link_map` and the `l_entry` fields to compute the offset
84
+between the "near" and "far" fields of the `link_map`, and use that to
85
+read the hashtables.
86
+
87
+Smol uses these two tricks to achieve an even smaller binary size.