94f97f2eaccbc05975ae9f5440170cb03d92825c
explain/rtld.md
... | ... | @@ -0,0 +1,87 @@ |
1 | +## Dynamic linking |
|
2 | + |
|
3 | +Dynamic linking is the process of loading code and data from shared libraries. |
|
4 | +For an overview, see [this |
|
5 | +](https://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082) and [ |
|
6 | +this page](https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488). |
|
7 | +For a series of very in-depth articles, see [this page |
|
8 | +](https://www.airs.com/blog/archives/38) etc. |
|
9 | + |
|
10 | +### The interesting bits |
|
11 | + |
|
12 | +Or at least, to us. |
|
13 | + |
|
14 | +The kernel recognises a dynamic ELF executable by the fact it has a `PT_INTERP` |
|
15 | +phdr which points to the dynamic linker. See [Process creation](/explain/proc) |
|
16 | +for more info. However, it turns out the dynamic linker is just a regular, |
|
17 | +statically linked ELF executable, and you can simply give it the program to run |
|
18 | +as an argument. When using a shell dropper, this technique spares out a few |
|
19 | +bytes. |
|
20 | + |
|
21 | +When the kernel loads our program and the dynamic linker (`ld.so`), it makes |
|
22 | +the process jump to the dynamic linker. On glibc, the linker's entrypoint is |
|
23 | +`_dl_start_user` ([i386 |
|
24 | +](https://code.woboq.org/userspace/glibc/sysdeps/i386/dl-machine.h.html#153), |
|
25 | +[x86_64 |
|
26 | +](https://code.woboq.org/userspace/glibc/sysdeps/x86_64/dl-machine.h.html#141)). |
|
27 | +It then loads the main program's `PT_DYNAMIC` phdr, which |
|
28 | +contains a key-value table which describes which libraries we depend on |
|
29 | +(`DT_NEEDED`), where the symbol and string tables are (`DT_SYMTAB`, `DT_STRTAB` |
|
30 | +), etc. The dynamic linker then loads these libraries, resolves the needed |
|
31 | +symbols, and performs the required relocations to glue all code together. |
|
32 | + |
|
33 | +However, there's one entry, `DT_DEBUG`, which isn't documented (the docs say |
|
34 | +"for debugging purposes"). What it actually does, is that the dynamic linker |
|
35 | +places a pointer to its `r_debug` struct in the value field. This behavior |
|
36 | +is mostly portable (as in, it works on at least glibc, musl and FreeBSD). If |
|
37 | +you look at your system's `link.h` file (eg. in `/usr/include`), you can see |
|
38 | +the contents of this struct. The second field is a pointer to the root |
|
39 | +`link_map`. More about this one later. |
|
40 | + |
|
41 | +After all the dependencies are loaded etc., the runtime linker will jump to our |
|
42 | +entrypoint (`_start`), without clearing the registers or cleaning up the stack. |
|
43 | + |
|
44 | +### The attack plan |
|
45 | + |
|
46 | +Normal dynamic linking needs a lot of (large) ELF header stuff. The trick is to |
|
47 | +bypass all this, and do the necessary minimal amount of linking ourselves. |
|
48 | + |
|
49 | +Now let's have a look at the `link_map`. (It's in the same `link.h` header.) |
|
50 | +It's a linked list containing information of every ELF file loaded by the rtld. |
|
51 | +(The first entry is our binary, the second is the [vDSO |
|
52 | +](https://lwn.net/Articles/446528/). Usually, the third one is `libc`.) The |
|
53 | +`link_map` contains the path, base address and a pointer to its `PT_DYNAMIC` |
|
54 | +phdr. We can use the latter to traverse all the symbol tables and figure out |
|
55 | +what the address of every symbol is. This is what **bold** and **dnload** do, |
|
56 | +except they save a hash of every symbol name, instead of the symbol names |
|
57 | +themselves. |
|
58 | + |
|
59 | +However, there are a few ways to save even more bytes: |
|
60 | + |
|
61 | +First of all, instead of using the `DT_DEBUG` trick, we can read some of the |
|
62 | +internal linker state that's leaked to our entry point. On `i386`, the |
|
63 | +`link_map` ends up in `eax` because `_dl_start_user` loaded it in there, and it |
|
64 | +is retained after a function call because of the calling convention. |
|
65 | + |
|
66 | +On `x86_64`, it's a bit more convolved: the register the `link_map` is stored |
|
67 | +in doesn't get preserved, but the return address to some place in |
|
68 | +`_dl_start_user` is placed on the stack when calling some internal rtld function. |
|
69 | +We can read this address, to which we add an offset (to the instruction that |
|
70 | +fetches the `link_map`, which is placed in a global variable) to decode the |
|
71 | +offset to the `link_map`, and voila. |
|
72 | + |
|
73 | +Secondly, instead of iterating over the symbol tables (and having to compute |
|
74 | +the hashes of every symbol), we can use the [internal `link_map` data in glibc |
|
75 | +](https://code.woboq.org/userspace/glibc/include/link.h.html#link_map) to access |
|
76 | +the calculated symbol hash tables, so we only have to do a hashtable lookup to |
|
77 | +get the address of a symbol. |
|
78 | + |
|
79 | +However, the offset to the hashtable in the `link_map` tends to vary between |
|
80 | +glibc versions. But, note the presence of the `l_entry` field somewhere down |
|
81 | +there. This value is known at (static) link time, we control it, so we can |
|
82 | +simply scan the struct for the value, and use the difference between the |
|
83 | +address of the `link_map` and the `l_entry` fields to compute the offset |
|
84 | +between the "near" and "far" fields of the `link_map`, and use that to |
|
85 | +read the hashtables. |
|
86 | + |
|
87 | +Smol uses these two tricks to achieve an even smaller binary size. |