ELF: dynamic struggles
Intro
Every ELF64 binary starts with this header:
typedef struct elf64_hdr {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
We are only going to concern ourselves with dynamically linked
Elf64_Ehdr.e_type
= ET_EXEC
(executable files) or ET_DYN
(dynamic shared objects,
basically shared libraries).
Note: If you don’t know what dynamic linking means, I suggest to read this article. I will not mention ELF sections on purpose. They are not relevant in executables and shared libraries. They don’t have to be there and should be treated like a nice bonus when they actually are. See sstrip. This “technique” is used by malware fairly often and you don’t need sstrip to do the job.
e_phoff
specifies the start of a program header table
(PHT) in the file. The PHT is made
of Elf64_Phdr
entries (segments):
typedef struct elf64_phdr {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment, file & memory */
} Elf64_Phdr;
p_type
can have values such as PT_LOAD
, PT_DYNAMIC
, PT_INTERP
etc.
When loading an ELF binary, the linux kernel looks for PT_LOAD
segments
and maps them into memory (among other things). When doing so, it
uses both p_offset
(segment file offset) and p_vaddr
(the address where to map
the segment into memory). ELF segments can overlap in the file. Usually, there are 2 PT_LOAD
segments - 1 for code (R-X) and 1 for data (RW-). There can also be just 1 or more than 2.
Whenever a virtual address needs to be converted to a file offset, it can be done like this:
for(int i = 0; i < ehdr->e_phnum; i++) {
if(seg[i].p_type != PT_LOAD)
continue;
if(va >= seg[i].p_vaddr && va < seg[i].p_vaddr + seg[i].p_memsz) {
offset = seg[i].p_offset + (va - seg[i].p_vaddr);
}
}
When you dynamically link an ELF, PT_DYNAMIC
can be found in the program header table
of the resulting binary. It usually belongs to the second PT_LOAD
segment, therefore it is loaded
into memory. PT_INTERP
specifies the dynamic interpreter and the kernel is very sensitive about it.
PT_DYNAMIC
is an array of dynamic entries:
typedef struct {
Elf64_Sxword d_tag; /* entry tag value */
union {
Elf64_Xword d_val;
Elf64_Addr d_ptr;
} d_un;
} Elf64_Dyn;
d_tag
is the type of the dynamic entry. Dynamic entries contain vital information
for the dynamic linker. Information such as symbol relocations to figure out what API are you
trying to call (simplified) etc.
Case: executable binaries
Let’s compile a program and look at it with radare2 (always use the git version)! I am using radare on OS X:
$ r2 -v
radare2 0.10.2-git 10555 @ darwin-little-x86-64 git.0.10.1-99-g747699f
commit: 747699f712d7cc0402b20c9313a16634e68d7764 build: 2016-03-11
#include <fcntl.h>
#include <unistd.h>
int main()
{
int fd = open("hello", O_CREAT | O_TRUNC | O_WRONLY);
if(fd > 0) {
write(fd, "world", 5);
close(fd);
}
return 0;
}
I am using gcc (Debian 4.9.2-10) 4.9.2
ldd (Debian GLIBC 2.19-18+deb8u3) 2.19
on Debian 8.3 x64
.
Compile it (it should be dynamically linked unless given -static
) and check its size:
$ gcc myprogram.c -o myprogram
$ wc --bytes myprogram
6992
Use sstrip and check its size:
$ sstrip myprogram
$ wc --bytes myprogram
2528
$ r2 -A myprogram
[0x004004a0]> iS
[Sections]
idx=00 vaddr=0x004003a0 paddr=0x000003a0 sz=120 vsz=120 perm=----- name=.rela.plt
idx=01 vaddr=0x00400388 paddr=0x00000388 sz=24 vsz=24 perm=----- name=.rel.plt
idx=02 vaddr=0x00600990 paddr=0x00000990 sz=120 vsz=120 perm=----- name=.got.plt
idx=03 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=04 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=05 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=06 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=07 vaddr=0x006007b8 paddr=0x000007b8 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=08 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=09 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=10 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=11 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr
12 sections
[0x004004a0]> ir
[Relocations]
vaddr=0x006009a8 paddr=0x000009a8 type=SET_64 write
vaddr=0x006009b0 paddr=0x000009b0 type=SET_64 close
vaddr=0x006009b8 paddr=0x000009b8 type=SET_64 __libc_start_main
vaddr=0x006009c0 paddr=0x000009c0 type=SET_64 __gmon_start__
vaddr=0x006009c8 paddr=0x000009c8 type=SET_64 open
vaddr=0x00600988 paddr=0x00000988 type=SET_64 __gmon_start__
6 relocations
0x004004a0]> pdf @ main
╒ (fcn) main 74
│ ; var int local_0h @ rbp-0x0
│ ; var int local_4h @ rbp-0x4
│ ; DATA XREF from 0x004004bd (main)
│ 0x00400596 55 push rbp
│ 0x00400597 4889e5 mov rbp, rsp
│ 0x0040059a 4883ec10 sub rsp, 0x10
│ 0x0040059e be41020000 mov esi, 0x241
│ 0x004005a3 bf64064000 mov edi, 0x400664
│ 0x004005a8 b800000000 mov eax, 0
│ 0x004005ad e8defeffff call sym.imp.open
│ 0x004005b2 8945fc mov dword [rbp - local_4h], eax
│ 0x004005b5 837dfc00 cmp dword [rbp - local_4h], 0
│ ┌─< 0x004005b9 7e1e jle 0x4005d9
│ │ 0x004005bb 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005be ba05000000 mov edx, 5
│ │ 0x004005c3 be6a064000 mov esi, 0x40066a
│ │ 0x004005c8 89c7 mov edi, eax
│ │ 0x004005ca e881feffff call sym.imp.write
│ │ 0x004005cf 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005d2 89c7 mov edi, eax
│ │ 0x004005d4 e887feffff call sym.imp.close
Alright. Radare can clearly see what APIs are we trying to call. In short - radare used the information contained within the dynamic entries (relocations, dynamic symbol table, dynamic string table etc.) to figure it out.
Let’s try and use readelf
utility.
$ readelf -r myprogram
There are no relocations in this file.
Oops. readelf
by default relies on section header table
(which we took away) - first mistake, but this feature has been
known for a while. You have to force it to use PT_DYNAMIC
, not the section .dynamic
,
$ readelf -D -r myprogram
'RELA' relocation section at offset 0x400388 contains 24 bytes:
Offset Info Type Sym. Value Sym. Name + Addend
000000600988 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
'PLT' relocation section at offset 0x4003a0 contains 120 bytes:
Offset Info Type Sym. Value Sym. Name + Addend
0000006009a8 000100000007 R_X86_64_JUMP_SLO 0000000000000000 write + 0
0000006009b0 000200000007 R_X86_64_JUMP_SLO 0000000000000000 close + 0
0000006009b8 000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
0000006009c0 000400000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
0000006009c8 000500000007 R_X86_64_JUMP_SLO 0000000000000000 open + 0
But how do they load the dynamic entries from the file?
They use PT_DYNAMIC
’s p_offset
- the file offset.
Is this correct? Well..
Let’s jump into 010 Editor, load our program and use Tim Strazzere’s ELF template
(the one on their website is fairly outdated) to change the p_offset
field of PT_DYNAMIC
to 0x0
.
Run it.
$ strace ./myprogram
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777714751741270) = 3
write(3, "world", 5) = 5
close(3) = 0
That works. Let’s check radare.
$ r2 -A myprogram
0x004004a0]> iS
[Sections]
idx=00 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=01 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=02 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=03 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=04 vaddr=0x006007b8 paddr=0x00000000 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=05 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=06 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=07 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=08 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr
9 sections
[0x004004a0]> ir
[Relocations]
0 relocations
[0x004004a0]> pdf @ main
╒ (fcn) main 74
│ ; var int local_0h @ rbp-0x0
│ ; var int local_4h @ rbp-0x4
│ ; DATA XREF from 0x004004bd (main)
│ 0x00400596 55 push rbp
│ 0x00400597 4889e5 mov rbp, rsp
│ 0x0040059a 4883ec10 sub rsp, 0x10
│ 0x0040059e be41020000 mov esi, 0x241
│ 0x004005a3 bf64064000 mov edi, 0x400664
│ 0x004005a8 b800000000 mov eax, 0
│ 0x004005ad e8defeffff call fcn.00400490
│ 0x004005b2 8945fc mov dword [rbp - local_4h], eax
│ 0x004005b5 837dfc00 cmp dword [rbp - local_4h], 0
│ ┌─< 0x004005b9 7e1e jle 0x4005d9
│ │ 0x004005bb 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005be ba05000000 mov edx, 5
│ │ 0x004005c3 be6a064000 mov esi, 0x40066a
│ │ 0x004005c8 89c7 mov edi, eax
│ │ 0x004005ca e881feffff call fcn.00400450
│ │ 0x004005cf 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005d2 89c7 mov edi, eax
│ │ 0x004005d4 e887feffff call fcn.00400460
PT_DYNAMIC
p_offset
is 0x0
, radare shows no relocations.
To demonstrate with readelf
:
$ readelf -D -r myprogram
'RELA' relocation section at offset 0x0 contains 17179869188 bytes:
readelf: Warning: Virtual address 0x0 not located in any PT_LOAD segment.
readelf: Error: Reading 0x400000004 bytes extends past end of file for 64-bit relocation data
Radare2 uses p_offset
.
LLVM uses p_offset
.
The linux kernel does not really care about the dynamic segment, but looking for PT_DYNAMIC
identifier on LXR you can find this for example.
FreeBSD is doing it too. Glibc and IDA as well.
The general consensus seems to be that calculating the offset of the dynamic table
should be done
through its virtual address just like you would when converting a virtual address to an offset:
offset = 0x7A0 + (0x6007B8 - 0x6007A0) = 0x7B8
Is this it?
Yes. Well, except maybe adding another dynamic table
with a personal touch at the end of the file..
$ readelf -D --dynamic myprogram
Dynamic section at offset 0x9e0 contains 24 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x400418
0x000000000000000d (FINI) 0x400654
0x0000000000000019 (INIT_ARRAY) 0x6007a0
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x6007a8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400260
0x0000000000000005 (STRTAB) 0x400310
0x0000000000000006 (SYMTAB) 0x400280
0x000000000000000a (STRSZ) 73 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x600990
0x0000000000000002 (PLTRELSZ) 120 (bytes)
0x0000000000000014 (PLTREL) RELA
==> 0x0000000000000017 (JMPREL) 0xffffffff
0x0000000000000007 (RELA) 0x400388
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400368
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x40035a
0x0000000000000000 (NULL) 0x0
$ readelf -D -r myprogram
'RELA' relocation section at offset 0x400388 contains 24 bytes:
Offset Info Type Sym. Value Sym. Name + Addend
000000600988 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
'PLT' relocation section at offset 0xffffffff contains 120 bytes:
readelf: Warning: Virtual address 0xffffffff not located in any PT_LOAD segment.
readelf: Error: Reading 0x78 bytes extends past end of file for 64-bit relocation data
$ strace ./myprogram
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777722565300410) = 3
write(3, "world", 5) = 5
close(3) = 0
Shared libraries
Sample shared library, let’s call it libtest.c
:
#include <stdio.h>
void __attribute__((constructor)) foo()
{
puts("bar");
}
$ gcc -fPIC -shared libtest.c -o libtest.so
$ sstrip libtest.so
Try to run it with our previous myprogram
:
$ strace -E LD_PRELOAD=./libtest.so ./myprogram
write(1, "bar\n", 4bar
) = 4
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777601547305710) = 3
write(3, "world", 5) = 5
close(3) = 0
Check disassembly of foo
symbol in radare:
[0x000005b0]> pdf @ sym.foo
╒ (fcn) sym.foo 18
│ 0x000006b0 55 push rbp
│ 0x000006b1 4889e5 mov rbp, rsp
│ 0x000006b4 488d3d120000. lea rdi, [rip + 0x12] ; 0x6cd
│ 0x000006bb e8c0feffff call sym.imp.puts
│ 0x000006c0 5d pop rbp
╘ 0x000006c1 c3 ret
Change PT_DYNAMIC
p_offset
to 0x0
:
Run again..
$ strace -E LD_PRELOAD=./libtest.so ./myprogram
write(1, "bar\n", 4bar
) = 4
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777751564430010) = 3
write(3, "world", 5) = 5
close(3) = 0
Now take a look again with radare:
[0x000005b0]> iS
[Sections]
idx=00 vaddr=0x00000000 paddr=0x00000000 sz=1876 vsz=1876 perm=m-r-x name=LOAD0
idx=01 vaddr=0x00200758 paddr=0x00000758 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=02 vaddr=0x00200778 paddr=0x00000000 sz=448 vsz=448 perm=m-rw- name=DYNAMIC
idx=03 vaddr=0x00000190 paddr=0x00000190 sz=36 vsz=36 perm=m-r-- name=NOTE
idx=04 vaddr=0x000006d4 paddr=0x000006d4 sz=28 vsz=28 perm=m-r-- name=GNU_EH_FRAME
idx=05 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=06 vaddr=0x00000000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr
7 sections
[0x000005b0]> is
[Symbols]
0 symbols
[0x000005b0]> ir
[Relocations]
0 relocations
[0x000005b0]> pd 6 @ 0x6b0
0x000006b0 55 push rbp
0x000006b1 4889e5 mov rbp, rsp
0x000006b4 488d3d120000. lea rdi, [rip + 0x12] ; 0x6cd
0x000006bb e8c0feffff call fcn.00000580
0x000006c0 5d pop rbp
0x000006c1 c3 ret
You can probably tell where I am going with this.
Implications
I think it’s a nice trick to fool some popular tools and newbie reversers. :-) Also somewhat helpful if you are parsing ELF in your tool.
There are more things that can be done, but this post is way longer than I expected. Maybe next time.
Note: if I remember correctly, there was a CTF that used 2 dynamic string tables. One was
referenced from the dynamic table where PT_DYNAMIC
pointed to and the other from the .dynamic
section.
This caused some tools to show wrong APIs. If someone finds a link, let me know and I will update the post.
Thanks for reading!
UPDATE 2016/03/13:
Radare2 addressed this issue!
$ r2 -v
radare2 0.10.2-git 10577 @ darwin-little-x86-64 git.0.10.1-121-g1c443ca
commit: 1c443caccfcfbad0b25dd2c28acb6d3d70d8dd10 build: 2016-03-13
$ r2 -A myprogram
[0x004004a0]> iS
[Sections]
idx=00 vaddr=0x004003a0 paddr=0x000003a0 sz=120 vsz=120 perm=----- name=.rela.plt
idx=01 vaddr=0x00400388 paddr=0x00000388 sz=24 vsz=24 perm=----- name=.rel.plt
idx=02 vaddr=0x00600990 paddr=0x00000990 sz=120 vsz=120 perm=----- name=.got.plt
idx=03 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=04 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=05 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=06 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=07 vaddr=0x006007b8 paddr=0x00000000 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=08 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=09 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=10 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=11 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr
12 sections
[0x004004a0]> ir
[Relocations]
vaddr=0x006009a8 paddr=0x000009a8 type=SET_64 write
vaddr=0x006009b0 paddr=0x000009b0 type=SET_64 close
vaddr=0x006009b8 paddr=0x000009b8 type=SET_64 __libc_start_main
vaddr=0x006009c0 paddr=0x000009c0 type=SET_64 __gmon_start__
vaddr=0x006009c8 paddr=0x000009c8 type=SET_64 open
vaddr=0x00600988 paddr=0x00000988 type=SET_64 __gmon_start__
6 relocations
[0x004004a0]> pdf @ main
╒ (fcn) main 74
│ ; var int local_0h @ rbp-0x0
│ ; var int local_4h @ rbp-0x4
│ ; DATA XREF from 0x004004bd (main)
│ 0x00400596 55 push rbp
│ 0x00400597 4889e5 mov rbp, rsp
│ 0x0040059a 4883ec10 sub rsp, 0x10
│ 0x0040059e be41020000 mov esi, 0x241
│ 0x004005a3 bf64064000 mov edi, 0x400664
│ 0x004005a8 b800000000 mov eax, 0
│ 0x004005ad e8defeffff call sym.imp.open
│ 0x004005b2 8945fc mov dword [rbp - local_4h], eax
│ 0x004005b5 837dfc00 cmp dword [rbp - local_4h], 0
│ ┌─< 0x004005b9 7e1e jle 0x4005d9
│ │ 0x004005bb 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005be ba05000000 mov edx, 5
│ │ 0x004005c3 be6a064000 mov esi, 0x40066a
│ │ 0x004005c8 89c7 mov edi, eax
│ │ 0x004005ca e881feffff call sym.imp.write
│ │ 0x004005cf 8b45fc mov eax, dword [rbp - local_4h]
│ │ 0x004005d2 89c7 mov edi, eax
│ │ 0x004005d4 e887feffff call sym.imp.close
│ │ ; JMP XREF from 0x004005b9 (main)
│ └─> 0x004005d9 b800000000 mov eax, 0
│ 0x004005de c9 leave
╘ 0x004005df c3 ret