ELF: dynamic struggles

Intro

Every ELF64 binary starts with this header:

typedef struct elf64_hdr {
  unsigned char	e_ident[EI_NIDENT];
  Elf64_Half e_type;
  Elf64_Half e_machine;
  Elf64_Word e_version;
  Elf64_Addr e_entry;
  Elf64_Off e_phoff;
  Elf64_Off e_shoff;
  Elf64_Word e_flags;
  Elf64_Half e_ehsize;
  Elf64_Half e_phentsize;
  Elf64_Half e_phnum;
  Elf64_Half e_shentsize;
  Elf64_Half e_shnum;
  Elf64_Half e_shstrndx;
} Elf64_Ehdr;

We are only going to concern ourselves with dynamically linked Elf64_Ehdr.e_type = ET_EXEC (executable files) or ET_DYN (dynamic shared objects, basically shared libraries).

Note: If you don’t know what dynamic linking means, I suggest to read this article. I will not mention ELF sections on purpose. They are not relevant in executables and shared libraries. They don’t have to be there and should be treated like a nice bonus when they actually are. See sstrip. This “technique” is used by malware fairly often and you don’t need sstrip to do the job.

e_phoff specifies the start of a program header table (PHT) in the file. The PHT is made of Elf64_Phdr entries (segments):

typedef struct elf64_phdr {
  Elf64_Word p_type;
  Elf64_Word p_flags;
  Elf64_Off p_offset;		/* Segment file offset */
  Elf64_Addr p_vaddr;		/* Segment virtual address */
  Elf64_Addr p_paddr;		/* Segment physical address */
  Elf64_Xword p_filesz;		/* Segment size in file */
  Elf64_Xword p_memsz;		/* Segment size in memory */
  Elf64_Xword p_align;		/* Segment alignment, file & memory */
} Elf64_Phdr;

p_type can have values such as PT_LOAD, PT_DYNAMIC, PT_INTERP etc.

When loading an ELF binary, the linux kernel looks for PT_LOAD segments and maps them into memory (among other things). When doing so, it uses both p_offset (segment file offset) and p_vaddr (the address where to map the segment into memory). ELF segments can overlap in the file. Usually, there are 2 PT_LOAD segments - 1 for code (R-X) and 1 for data (RW-). There can also be just 1 or more than 2. Whenever a virtual address needs to be converted to a file offset, it can be done like this:

for(int i = 0; i < ehdr->e_phnum; i++) {
        if(seg[i].p_type != PT_LOAD)
                continue;

        if(va >= seg[i].p_vaddr && va < seg[i].p_vaddr + seg[i].p_memsz) {
                offset = seg[i].p_offset + (va - seg[i].p_vaddr);
        }
}

When you dynamically link an ELF, PT_DYNAMIC can be found in the program header table of the resulting binary. It usually belongs to the second PT_LOAD segment, therefore it is loaded into memory. PT_INTERP specifies the dynamic interpreter and the kernel is very sensitive about it.

PT_DYNAMIC is an array of dynamic entries:

typedef struct {
  Elf64_Sxword d_tag;		/* entry tag value */
  union {
    Elf64_Xword d_val;
    Elf64_Addr d_ptr;
  } d_un;
} Elf64_Dyn;

d_tag is the type of the dynamic entry. Dynamic entries contain vital information for the dynamic linker. Information such as symbol relocations to figure out what API are you trying to call (simplified) etc.

Case: executable binaries

Let’s compile a program and look at it with radare2 (always use the git version)! I am using radare on OS X:

$ r2 -v
radare2 0.10.2-git 10555 @ darwin-little-x86-64 git.0.10.1-99-g747699f
commit: 747699f712d7cc0402b20c9313a16634e68d7764 build: 2016-03-11

#include <fcntl.h>
#include <unistd.h>

int main()
{
        int fd = open("hello", O_CREAT | O_TRUNC | O_WRONLY);
        
        if(fd > 0) {
                write(fd, "world", 5);
                close(fd);
        }
        
        return 0;
}

I am using gcc (Debian 4.9.2-10) 4.9.2 ldd (Debian GLIBC 2.19-18+deb8u3) 2.19 on Debian 8.3 x64.

Compile it (it should be dynamically linked unless given -static) and check its size:

$ gcc myprogram.c -o myprogram
$ wc --bytes myprogram
6992

Use sstrip and check its size:

$ sstrip myprogram
$ wc --bytes myprogram
2528

$ r2 -A myprogram

[0x004004a0]> iS
[Sections]
idx=00 vaddr=0x004003a0 paddr=0x000003a0 sz=120 vsz=120 perm=----- name=.rela.plt
idx=01 vaddr=0x00400388 paddr=0x00000388 sz=24 vsz=24 perm=----- name=.rel.plt
idx=02 vaddr=0x00600990 paddr=0x00000990 sz=120 vsz=120 perm=----- name=.got.plt
idx=03 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=04 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=05 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=06 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=07 vaddr=0x006007b8 paddr=0x000007b8 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=08 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=09 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=10 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=11 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr

12 sections

[0x004004a0]> ir
[Relocations]
vaddr=0x006009a8 paddr=0x000009a8 type=SET_64 write
vaddr=0x006009b0 paddr=0x000009b0 type=SET_64 close
vaddr=0x006009b8 paddr=0x000009b8 type=SET_64 __libc_start_main
vaddr=0x006009c0 paddr=0x000009c0 type=SET_64 __gmon_start__
vaddr=0x006009c8 paddr=0x000009c8 type=SET_64 open
vaddr=0x00600988 paddr=0x00000988 type=SET_64 __gmon_start__

6 relocations

0x004004a0]> pdf @ main
╒ (fcn) main 74
│           ; var int local_0h     @ rbp-0x0
│           ; var int local_4h     @ rbp-0x4
│           ; DATA XREF from 0x004004bd (main)
│           0x00400596      55             push rbp
│           0x00400597      4889e5         mov rbp, rsp
│           0x0040059a      4883ec10       sub rsp, 0x10
│           0x0040059e      be41020000     mov esi, 0x241
│           0x004005a3      bf64064000     mov edi, 0x400664
│           0x004005a8      b800000000     mov eax, 0
│           0x004005ad      e8defeffff     call sym.imp.open
│           0x004005b2      8945fc         mov dword [rbp - local_4h], eax
│           0x004005b5      837dfc00       cmp dword [rbp - local_4h], 0
│       ┌─< 0x004005b9      7e1e           jle 0x4005d9
│       │   0x004005bb      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005be      ba05000000     mov edx, 5
│       │   0x004005c3      be6a064000     mov esi, 0x40066a
│       │   0x004005c8      89c7           mov edi, eax
│       │   0x004005ca      e881feffff     call sym.imp.write
│       │   0x004005cf      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005d2      89c7           mov edi, eax
│       │   0x004005d4      e887feffff     call sym.imp.close

Alright. Radare can clearly see what APIs are we trying to call. In short - radare used the information contained within the dynamic entries (relocations, dynamic symbol table, dynamic string table etc.) to figure it out.

Let’s try and use readelf utility.

$ readelf -r myprogram

There are no relocations in this file.

Oops. readelf by default relies on section header table (which we took away) - first mistake, but this feature has been known for a while. You have to force it to use PT_DYNAMIC, not the section .dynamic,

$ readelf -D -r myprogram

'RELA' relocation section at offset 0x400388 contains 24 bytes:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600988  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

'PLT' relocation section at offset 0x4003a0 contains 120 bytes:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000006009a8  000100000007 R_X86_64_JUMP_SLO 0000000000000000 write + 0
0000006009b0  000200000007 R_X86_64_JUMP_SLO 0000000000000000 close + 0
0000006009b8  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
0000006009c0  000400000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
0000006009c8  000500000007 R_X86_64_JUMP_SLO 0000000000000000 open + 0

But how do they load the dynamic entries from the file?

They use PT_DYNAMIC’s p_offset - the file offset.

Is this correct? Well..

Let’s jump into 010 Editor, load our program and use Tim Strazzere’s ELF template (the one on their website is fairly outdated) to change the p_offset field of PT_DYNAMIC to 0x0.

p_offset is 0

Run it.

$ strace ./myprogram
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777714751741270) = 3
write(3, "world", 5)                    = 5
close(3)                                = 0

That works. Let’s check radare.

$ r2 -A myprogram

0x004004a0]> iS
[Sections]
idx=00 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=01 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=02 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=03 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=04 vaddr=0x006007b8 paddr=0x00000000 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=05 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=06 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=07 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=08 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr

9 sections

[0x004004a0]> ir
[Relocations]

0 relocations

[0x004004a0]> pdf @ main
╒ (fcn) main 74
│           ; var int local_0h     @ rbp-0x0
│           ; var int local_4h     @ rbp-0x4
│           ; DATA XREF from 0x004004bd (main)
│           0x00400596      55             push rbp
│           0x00400597      4889e5         mov rbp, rsp
│           0x0040059a      4883ec10       sub rsp, 0x10
│           0x0040059e      be41020000     mov esi, 0x241
│           0x004005a3      bf64064000     mov edi, 0x400664
│           0x004005a8      b800000000     mov eax, 0
│           0x004005ad      e8defeffff     call fcn.00400490
│           0x004005b2      8945fc         mov dword [rbp - local_4h], eax
│           0x004005b5      837dfc00       cmp dword [rbp - local_4h], 0
│       ┌─< 0x004005b9      7e1e           jle 0x4005d9
│       │   0x004005bb      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005be      ba05000000     mov edx, 5
│       │   0x004005c3      be6a064000     mov esi, 0x40066a
│       │   0x004005c8      89c7           mov edi, eax
│       │   0x004005ca      e881feffff     call fcn.00400450
│       │   0x004005cf      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005d2      89c7           mov edi, eax
│       │   0x004005d4      e887feffff     call fcn.00400460

PT_DYNAMIC p_offset is 0x0, radare shows no relocations.

To demonstrate with readelf:

$ readelf -D -r myprogram

'RELA' relocation section at offset 0x0 contains 17179869188 bytes:
readelf: Warning: Virtual address 0x0 not located in any PT_LOAD segment.
readelf: Error: Reading 0x400000004 bytes extends past end of file for 64-bit relocation data

Radare2 uses p_offset. LLVM uses p_offset.

The linux kernel does not really care about the dynamic segment, but looking for PT_DYNAMIC identifier on LXR you can find this for example. FreeBSD is doing it too. Glibc and IDA as well.

The general consensus seems to be that calculating the offset of the dynamic table should be done through its virtual address just like you would when converting a virtual address to an offset:

offset = 0x7A0 + (0x6007B8 - 0x6007A0) = 0x7B8

Is this it?

Yes. Well, except maybe adding another dynamic table with a personal touch at the end of the file..

Change dynamic Change JMPREL

$ readelf -D --dynamic myprogram

Dynamic section at offset 0x9e0 contains 24 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x400418
 0x000000000000000d (FINI)               0x400654
 0x0000000000000019 (INIT_ARRAY)         0x6007a0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x6007a8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400260
 0x0000000000000005 (STRTAB)             0x400310
 0x0000000000000006 (SYMTAB)             0x400280
 0x000000000000000a (STRSZ)              73 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x600990
 0x0000000000000002 (PLTRELSZ)           120 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 ==> 0x0000000000000017 (JMPREL)             0xffffffff
 0x0000000000000007 (RELA)               0x400388
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400368
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x40035a
 0x0000000000000000 (NULL)               0x0

$ readelf -D -r myprogram

'RELA' relocation section at offset 0x400388 contains 24 bytes:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600988  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

'PLT' relocation section at offset 0xffffffff contains 120 bytes:
readelf: Warning: Virtual address 0xffffffff not located in any PT_LOAD segment.
readelf: Error: Reading 0x78 bytes extends past end of file for 64-bit relocation data

$ strace ./myprogram
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777722565300410) = 3
write(3, "world", 5)                    = 5
close(3)                                = 0

Shared libraries

Sample shared library, let’s call it libtest.c:

#include <stdio.h>

void __attribute__((constructor)) foo()
{
        puts("bar");
}

$ gcc -fPIC -shared libtest.c -o libtest.so
$ sstrip libtest.so

Try to run it with our previous myprogram:

$ strace -E LD_PRELOAD=./libtest.so ./myprogram
write(1, "bar\n", 4bar
)                    = 4
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777601547305710) = 3
write(3, "world", 5)                    = 5
close(3)                                = 0

Check disassembly of foo symbol in radare:

[0x000005b0]> pdf @ sym.foo
╒ (fcn) sym.foo 18
│           0x000006b0      55             push rbp
│           0x000006b1      4889e5         mov rbp, rsp
│           0x000006b4      488d3d120000.  lea rdi, [rip + 0x12]       ; 0x6cd
│           0x000006bb      e8c0feffff     call sym.imp.puts
│           0x000006c0      5d             pop rbp
╘           0x000006c1      c3             ret

Change PT_DYNAMIC p_offset to 0x0:

Change shared dynamic offset

Run again..

$ strace -E LD_PRELOAD=./libtest.so ./myprogram
write(1, "bar\n", 4bar
)                    = 4
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 03777751564430010) = 3
write(3, "world", 5)                    = 5
close(3)                                = 0

Now take a look again with radare:

[0x000005b0]> iS
[Sections]
idx=00 vaddr=0x00000000 paddr=0x00000000 sz=1876 vsz=1876 perm=m-r-x name=LOAD0
idx=01 vaddr=0x00200758 paddr=0x00000758 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=02 vaddr=0x00200778 paddr=0x00000000 sz=448 vsz=448 perm=m-rw- name=DYNAMIC
idx=03 vaddr=0x00000190 paddr=0x00000190 sz=36 vsz=36 perm=m-r-- name=NOTE
idx=04 vaddr=0x000006d4 paddr=0x000006d4 sz=28 vsz=28 perm=m-r-- name=GNU_EH_FRAME
idx=05 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=06 vaddr=0x00000000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr

7 sections

[0x000005b0]> is
[Symbols]

0 symbols

[0x000005b0]> ir
[Relocations]

0 relocations

[0x000005b0]> pd 6 @ 0x6b0
            0x000006b0      55             push rbp
            0x000006b1      4889e5         mov rbp, rsp
            0x000006b4      488d3d120000.  lea rdi, [rip + 0x12]       ; 0x6cd
            0x000006bb      e8c0feffff     call fcn.00000580
            0x000006c0      5d             pop rbp
            0x000006c1      c3             ret

You can probably tell where I am going with this.

Implications

I think it’s a nice trick to fool some popular tools and newbie reversers. :-) Also somewhat helpful if you are parsing ELF in your tool.

There are more things that can be done, but this post is way longer than I expected. Maybe next time.

Note: if I remember correctly, there was a CTF that used 2 dynamic string tables. One was referenced from the dynamic table where PT_DYNAMIC pointed to and the other from the .dynamic section. This caused some tools to show wrong APIs. If someone finds a link, let me know and I will update the post.

Thanks for reading!

UPDATE 2016/03/13:

Radare2 addressed this issue!

$ r2 -v
radare2 0.10.2-git 10577 @ darwin-little-x86-64 git.0.10.1-121-g1c443ca
commit: 1c443caccfcfbad0b25dd2c28acb6d3d70d8dd10 build: 2016-03-13

p_offset is 0

$ r2 -A myprogram

[0x004004a0]> iS
[Sections]
idx=00 vaddr=0x004003a0 paddr=0x000003a0 sz=120 vsz=120 perm=----- name=.rela.plt
idx=01 vaddr=0x00400388 paddr=0x00000388 sz=24 vsz=24 perm=----- name=.rel.plt
idx=02 vaddr=0x00600990 paddr=0x00000990 sz=120 vsz=120 perm=----- name=.got.plt
idx=03 vaddr=0x00400040 paddr=0x00000040 sz=448 vsz=448 perm=m-r-x name=PHDR
idx=04 vaddr=0x00400200 paddr=0x00000200 sz=28 vsz=28 perm=m-r-- name=INTERP
idx=05 vaddr=0x00400000 paddr=0x00000000 sz=1948 vsz=1948 perm=m-r-x name=LOAD0
idx=06 vaddr=0x006007a0 paddr=0x000007a0 sz=576 vsz=584 perm=m-rw- name=LOAD1
idx=07 vaddr=0x006007b8 paddr=0x00000000 sz=464 vsz=464 perm=m-rw- name=DYNAMIC
idx=08 vaddr=0x0040021c paddr=0x0000021c sz=68 vsz=68 perm=m-r-- name=NOTE
idx=09 vaddr=0x00400670 paddr=0x00000670 sz=52 vsz=52 perm=m-r-- name=GNU_EH_FRAME
idx=10 vaddr=0x00000000 paddr=0x00000000 sz=0 vsz=0 perm=m-rw- name=GNU_STACK
idx=11 vaddr=0x00400000 paddr=0x00000000 sz=64 vsz=64 perm=m-rw- name=ehdr

12 sections

[0x004004a0]> ir
[Relocations]
vaddr=0x006009a8 paddr=0x000009a8 type=SET_64 write
vaddr=0x006009b0 paddr=0x000009b0 type=SET_64 close
vaddr=0x006009b8 paddr=0x000009b8 type=SET_64 __libc_start_main
vaddr=0x006009c0 paddr=0x000009c0 type=SET_64 __gmon_start__
vaddr=0x006009c8 paddr=0x000009c8 type=SET_64 open
vaddr=0x00600988 paddr=0x00000988 type=SET_64 __gmon_start__

6 relocations

[0x004004a0]> pdf @ main
╒ (fcn) main 74
│           ; var int local_0h     @ rbp-0x0
│           ; var int local_4h     @ rbp-0x4
│           ; DATA XREF from 0x004004bd (main)
│           0x00400596      55             push rbp
│           0x00400597      4889e5         mov rbp, rsp
│           0x0040059a      4883ec10       sub rsp, 0x10
│           0x0040059e      be41020000     mov esi, 0x241
│           0x004005a3      bf64064000     mov edi, 0x400664
│           0x004005a8      b800000000     mov eax, 0
│           0x004005ad      e8defeffff     call sym.imp.open
│           0x004005b2      8945fc         mov dword [rbp - local_4h], eax
│           0x004005b5      837dfc00       cmp dword [rbp - local_4h], 0
│       ┌─< 0x004005b9      7e1e           jle 0x4005d9
│       │   0x004005bb      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005be      ba05000000     mov edx, 5
│       │   0x004005c3      be6a064000     mov esi, 0x40066a
│       │   0x004005c8      89c7           mov edi, eax
│       │   0x004005ca      e881feffff     call sym.imp.write
│       │   0x004005cf      8b45fc         mov eax, dword [rbp - local_4h]
│       │   0x004005d2      89c7           mov edi, eax
│       │   0x004005d4      e887feffff     call sym.imp.close
│       │   ; JMP XREF from 0x004005b9 (main)
│       └─> 0x004005d9      b800000000     mov eax, 0
│           0x004005de      c9             leave
╘           0x004005df      c3             ret

Intro

Case: executable binaries

Shared libraries

Implications

Related Posts