Changes in 94f97f2: Created rtld (markdown)

				explain/rtld.md
			
          @@ -0,0 +1,87 @@

          +## Dynamic linking

          +

          +Dynamic linking is the process of loading code and data from shared libraries.

          +For an overview, see [this

          +](https://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082) and [

          +this page](https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488).

          +For a series of very in-depth articles, see [this page

          +](https://www.airs.com/blog/archives/38) etc.

          +

          +### The interesting bits

          +

          +Or at least, to us.

          +

          +The kernel recognises a dynamic ELF executable by the fact it has a `PT_INTERP`

          +phdr which points to the dynamic linker. See [Process creation](/explain/proc)

          +for more info. However, it turns out the dynamic linker is just a regular,

          +statically linked ELF executable, and you can simply give it the program to run

          +as an argument. When using a shell dropper, this technique spares out a few

          +bytes.

          +

          +When the kernel loads our program and the dynamic linker (`ld.so`), it makes

          +the process jump to the dynamic linker. On glibc, the linker's entrypoint is

          +`_dl_start_user` ([i386

          +](https://code.woboq.org/userspace/glibc/sysdeps/i386/dl-machine.h.html#153),

          +[x86_64

          +](https://code.woboq.org/userspace/glibc/sysdeps/x86_64/dl-machine.h.html#141)).

          +It then loads the main program's `PT_DYNAMIC` phdr, which

          +contains a key-value table which describes which libraries we depend on

          +(`DT_NEEDED`), where the symbol and string tables are (`DT_SYMTAB`, `DT_STRTAB`

          +), etc. The dynamic linker then loads these libraries, resolves the needed

          +symbols, and performs the required relocations to glue all code together.

          +

          +However, there's one entry, `DT_DEBUG`, which isn't documented (the docs say

          +"for debugging purposes"). What it actually does, is that the dynamic linker

          +places a pointer to its `r_debug` struct in the value field. This behavior

          +is mostly portable (as in, it works on at least glibc, musl and FreeBSD). If

          +you look at your system's `link.h` file (eg. in `/usr/include`), you can see

          +the contents of this struct. The second field is a pointer to the root

          +`link_map`. More about this one later.

          +

          +After all the dependencies are loaded etc., the runtime linker will jump to our

          +entrypoint (`_start`), without clearing the registers or cleaning up the stack.

          +

          +### The attack plan

          +

          +Normal dynamic linking needs a lot of (large) ELF header stuff. The trick is to

          +bypass all this, and do the necessary minimal amount of linking ourselves.

          +

          +Now let's have a look at the `link_map`. (It's in the same `link.h` header.)

          +It's a linked list containing information of every ELF file loaded by the rtld.

          +(The first entry is our binary, the second is the [vDSO

          +](https://lwn.net/Articles/446528/). Usually, the third one is `libc`.) The

          +`link_map` contains the path, base address and a pointer to its `PT_DYNAMIC`

          +phdr. We can use the latter to traverse all the symbol tables and figure out

          +what the address of every symbol is. This is what **bold** and **dnload** do,

          +except they save a hash of every symbol name, instead of the symbol names

          +themselves.

          +

          +However, there are a few ways to save even more bytes:

          +

          +First of all, instead of using the `DT_DEBUG` trick, we can read some of the

          +internal linker state that's leaked to our entry point. On `i386`, the

          +`link_map` ends up in `eax` because `_dl_start_user` loaded it in there, and it

          +is retained after a function call because of the calling convention.

          +

          +On `x86_64`, it's a bit more convolved: the register the `link_map` is stored

          +in doesn't get preserved, but the return address to some place in

          +`_dl_start_user` is placed on the stack when calling some internal rtld function.

          +We can read this address, to which we add an offset (to the instruction that

          +fetches the `link_map`, which is placed in a global variable) to decode the

          +offset to the `link_map`, and voila.

          +

          +Secondly, instead of iterating over the symbol tables (and having to compute

          +the hashes of every symbol), we can use the [internal `link_map` data in glibc

          +](https://code.woboq.org/userspace/glibc/include/link.h.html#link_map) to access

          +the calculated symbol hash tables, so we only have to do a hashtable lookup to

          +get the address of a symbol.

          +

          +However, the offset to the hashtable in the `link_map` tends to vary between

          +glibc versions. But, note the presence of the `l_entry` field somewhere down

          +there. This value is known at (static) link time, we control it, so we can

          +simply scan the struct for the value, and use the difference between the

          +address of the `link_map` and the `l_entry` fields to compute the offset

          +between the "near" and "far" fields of the `link_map`, and use that to

          +read the hashtables.

          +

          +Smol uses these two tricks to achieve an even smaller binary size.

...	...	@@ -0,0 +1,87 @@
	1	+## Dynamic linking
	2	+
	3	+Dynamic linking is the process of loading code and data from shared libraries.
	4	+For an overview, see [this
	5	+](https://0x00sec.org/t/linux-internals-dynamic-linking-wizardry/1082) and [
	6	+this page](https://0x00sec.org/t/linux-internals-the-art-of-symbol-resolution/1488).
	7	+For a series of very in-depth articles, see [this page
	8	+](https://www.airs.com/blog/archives/38) etc.
	9	+
	10	+### The interesting bits
	11	+
	12	+Or at least, to us.
	13	+
	14	+The kernel recognises a dynamic ELF executable by the fact it has a `PT_INTERP`
	15	+phdr which points to the dynamic linker. See [Process creation](/explain/proc)
	16	+for more info. However, it turns out the dynamic linker is just a regular,
	17	+statically linked ELF executable, and you can simply give it the program to run
	18	+as an argument. When using a shell dropper, this technique spares out a few
	19	+bytes.
	20	+
	21	+When the kernel loads our program and the dynamic linker (`ld.so`), it makes
	22	+the process jump to the dynamic linker. On glibc, the linker's entrypoint is
	23	+`_dl_start_user` ([i386
	24	+](https://code.woboq.org/userspace/glibc/sysdeps/i386/dl-machine.h.html#153),
	25	+[x86_64
	26	+](https://code.woboq.org/userspace/glibc/sysdeps/x86_64/dl-machine.h.html#141)).
	27	+It then loads the main program's `PT_DYNAMIC` phdr, which
	28	+contains a key-value table which describes which libraries we depend on
	29	+(`DT_NEEDED`), where the symbol and string tables are (`DT_SYMTAB`, `DT_STRTAB`
	30	+), etc. The dynamic linker then loads these libraries, resolves the needed
	31	+symbols, and performs the required relocations to glue all code together.
	32	+
	33	+However, there's one entry, `DT_DEBUG`, which isn't documented (the docs say
	34	+"for debugging purposes"). What it actually does, is that the dynamic linker
	35	+places a pointer to its `r_debug` struct in the value field. This behavior
	36	+is mostly portable (as in, it works on at least glibc, musl and FreeBSD). If
	37	+you look at your system's `link.h` file (eg. in `/usr/include`), you can see
	38	+the contents of this struct. The second field is a pointer to the root
	39	+`link_map`. More about this one later.
	40	+
	41	+After all the dependencies are loaded etc., the runtime linker will jump to our
	42	+entrypoint (`_start`), without clearing the registers or cleaning up the stack.
	43	+
	44	+### The attack plan
	45	+
	46	+Normal dynamic linking needs a lot of (large) ELF header stuff. The trick is to
	47	+bypass all this, and do the necessary minimal amount of linking ourselves.
	48	+
	49	+Now let's have a look at the `link_map`. (It's in the same `link.h` header.)
	50	+It's a linked list containing information of every ELF file loaded by the rtld.
	51	+(The first entry is our binary, the second is the [vDSO
	52	+](https://lwn.net/Articles/446528/). Usually, the third one is `libc`.) The
	53	+`link_map` contains the path, base address and a pointer to its `PT_DYNAMIC`
	54	+phdr. We can use the latter to traverse all the symbol tables and figure out
	55	+what the address of every symbol is. This is what bold and dnload do,
	56	+except they save a hash of every symbol name, instead of the symbol names
	57	+themselves.
	58	+
	59	+However, there are a few ways to save even more bytes:
	60	+
	61	+First of all, instead of using the `DT_DEBUG` trick, we can read some of the
	62	+internal linker state that's leaked to our entry point. On `i386`, the
	63	+`link_map` ends up in `eax` because `_dl_start_user` loaded it in there, and it
	64	+is retained after a function call because of the calling convention.
	65	+
	66	+On `x86_64`, it's a bit more convolved: the register the `link_map` is stored
	67	+in doesn't get preserved, but the return address to some place in
	68	+`_dl_start_user` is placed on the stack when calling some internal rtld function.
	69	+We can read this address, to which we add an offset (to the instruction that
	70	+fetches the `link_map`, which is placed in a global variable) to decode the
	71	+offset to the `link_map`, and voila.
	72	+
	73	+Secondly, instead of iterating over the symbol tables (and having to compute
	74	+the hashes of every symbol), we can use the [internal `link_map` data in glibc
	75	+](https://code.woboq.org/userspace/glibc/include/link.h.html#link_map) to access
	76	+the calculated symbol hash tables, so we only have to do a hashtable lookup to
	77	+get the address of a symbol.
	78	+
	79	+However, the offset to the hashtable in the `link_map` tends to vary between
	80	+glibc versions. But, note the presence of the `l_entry` field somewhere down
	81	+there. This value is known at (static) link time, we control it, so we can
	82	+simply scan the struct for the value, and use the difference between the
	83	+address of the `link_map` and the `l_entry` fields to compute the offset
	84	+between the "near" and "far" fields of the `link_map`, and use that to
	85	+read the hashtables.
	86	+
	87	+Smol uses these two tricks to achieve an even smaller binary size.