hacktricks/binary-exploitation/heap/README.md

201 lines
12 KiB
Markdown
Raw Normal View History

2024-04-10 15:24:02 +00:00
# Heap
2024-05-09 18:03:32 +00:00
## Heap Basics
The heap is basically the place where a program is going to be able to store data when it requests data calling functions like **`malloc`**, `calloc`... Moreover, when this memory is no longer needed it's made available calling the function **`free`**.
As it's shown, its just after where the binary is being loaded in memory (check the `[heap]` section):
<figure><img src="../../.gitbook/assets/image (1241).png" alt=""><figcaption></figcaption></figure>
### Basic Chunk Allocation
When some data is requested to be stored in the heap, some space of the heap is allocated to it. This space will belong to a bin and only the requested data + the space of the bin headers + minimum bin size offset will be reserved for the chunk. The goal is to just reserve as minimum memory as possible without making it complicated to find where each chunk is. For this, the metadata chunk information is used to know where each used/free chunk is.
There are different ways to reserver the space mainly depending on the used bin, but a general methodology is the following:
* The program starts by requesting certain amount of memory.
* If in the list of chunks there someone available big enough to fulfil the request, it'll be used
* This might even mean that part of the available chunk will be used for this request and the rest will be added to the chunks list
* If there isn't any available chunk in the list but there is still space in allocated heap memory, the heap manager creates a new chunk
* If there is not enough heap space to allocate the new chunk, the heap manager asks the kernel to expand the memory allocated to the heap and then use this memory to generate the new chunk
* If everything fails, `malloc` returns null.
Note that if the requested **memory passes a threshold**, **`mmap`** will be used to map the requested memory.
### Arenas
In **multithreaded** applications, the heap manager must prevent **race conditions** that could lead to crashes. Initially, this was done using a **global mutex** to ensure that only one thread could access the heap at a time, but this caused **performance issues** due to the mutex-induced bottleneck.
To address this, the ptmalloc2 heap allocator introduced "arenas," where **each arena** acts as a **separate heap** with its **own** data **structures** and **mutex**, allowing multiple threads to perform heap operations without interfering with each other, as long as they use different arenas.
The default "main" arena handles heap operations for single-threaded applications. When **new threads** are added, the heap manager assigns them **secondary arenas** to reduce contention. It first attempts to attach each new thread to an unused arena, creating new ones if needed, up to a limit of 2 times the CPU cores for 32-bit systems and 8 times for 64-bit systems. Once the limit is reached, **threads must share arenas**, leading to potential contention.
Unlike the main arena, which expands using the `brk` system call, secondary arenas create "subheaps" using `mmap` and `mprotect` to simulate the heap behavior, allowing flexibility in managing memory for multithreaded operations.
### Subheaps
Subheaps serve as memory reserves for secondary arenas in multithreaded applications, allowing them to grow and manage their own heap regions separately from the main heap. Here's how subheaps differ from the initial heap and how they operate:
1. **Initial Heap vs. Subheaps**:
* The initial heap is located directly after the program's binary in memory, and it expands using the `sbrk` system call.
* Subheaps, used by secondary arenas, are created through `mmap`, a system call that maps a specified memory region.
2. **Memory Reservation with `mmap`**:
* When the heap manager creates a subheap, it reserves a large block of memory through `mmap`. This reservation doesn't allocate memory immediately; it simply designates a region that other system processes or allocations shouldn't use.
* By default, the reserved size for a subheap is 1 MB for 32-bit processes and 64 MB for 64-bit processes.
3. **Gradual Expansion with `mprotect`**:
* The reserved memory region is initially marked as `PROT_NONE`, indicating that the kernel doesn't need to allocate physical memory to this space yet.
* To "grow" the subheap, the heap manager uses `mprotect` to change page permissions from `PROT_NONE` to `PROT_READ | PROT_WRITE`, prompting the kernel to allocate physical memory to the previously reserved addresses. This step-by-step approach allows the subheap to expand as needed.
* Once the entire subheap is exhausted, the heap manager creates a new subheap to continue allocation.
2024-05-10 17:33:38 +00:00
### malloc\_state
**Each heap** (main arena or other threads arenas) has a **`malloc_state` structure.**\
Its important to notice that the **main arena `malloc_stat`**`e` structure is a **global variable in the libc** (therefore located in the libc memory space).\
In the case of **`malloc_state`** structures of the heaps of threads, they are located **inside own thread "heap"**.
There some interesting things to note from this structure (see C code below):
* The `mchunkptr bins[NBINS * 2 - 2];` contains **pointers** to the **first and last chunks** of the small, large and unsorted **bins** (the -2 is because the index 0 is not used)
* Therefore, the **first chunk** of these bins will have a **backwards pointer to this structure** and the **last chunk** of these bins will have a **forward pointer** to this structure. Which basically means that if you can l**eak these addresses in the main arena** you will have a pointer to the structure in the **libc**.
* The structs `struct malloc_state *next;` and `struct malloc_state *next_free;` are linked lists os arenas
* The `top` chunk is the last "chunk", which is basically **all the heap reminding space**. Once the top chunk is "empty", the heap is completely used and it needs to request more space.
* The `last reminder` chunk comes from cases where an exact size chunk is not available and therefore a bigger chunk is splitter, a pointer remaining part is placed here.
```c
// From https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/malloc_state
struct malloc_state
{
/* Serialize access. */
__libc_lock_define (, mutex);
/* Flags (formerly in max_fast). */
int flags;
/* Fastbins */
mfastbinptr fastbinsY[NFASTBINS];
/* Base of the topmost chunk -- not otherwise kept in a bin */
mchunkptr top;
/* The remainder from the most recent split of a small request */
mchunkptr last_remainder;
/* Normal bins packed as described above */
mchunkptr bins[NBINS * 2 - 2];
/* Bitmap of bins */
unsigned int binmap[BINMAPSIZE];
/* Linked list */
struct malloc_state *next;
/* Linked list for free arenas. Access to this field is serialized
by free_list_lock in arena.c. */
struct malloc_state *next_free;
/* Number of threads attached to this arena. 0 if the arena is on
the free list. Access to this field is serialized by
free_list_lock in arena.c. */
INTERNAL_SIZE_T attached_threads;
/* Memory allocated from the system in this arena. */
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};
typedef struct malloc_state *mstate;
```
### malloc\_chunk
This structure represents a particular chunk of memory. The various fields have different meaning for allocated and unallocated chunks.
```c
// From https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/malloc_chunk
struct malloc_chunk {
INTERNAL_SIZE_T mchunk_prev_size; /* Size of previous chunk, if it is free. */
INTERNAL_SIZE_T mchunk_size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if this chunk is free. */
struct malloc_chunk* bk;
/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if this chunk is free. */
struct malloc_chunk* bk_nextsize;
};
typedef struct malloc_chunk* mchunkptr;
```
2024-05-09 18:03:32 +00:00
As commented previously, these chunks also have some metadata, very good represented in this image:
<figure><img src="../../.gitbook/assets/image (1242).png" alt=""><figcaption><p><a href="https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png">https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png</a></p></figcaption></figure>
The metadata is usually 0x08B indicating the current chunk size using the last 3 bits to indicate:
* `A`: If 1 it comes from a subheap, if 0 it's in the main arena
* `M`: If 1, this chunk is part of a space allocated with mmap and not part of a heap
* `P`: If 1, the previous chunk is in use
Then, the space for the user data, and finally 0x08B to indicate the previous chunk size when the chunk is available (or to store user data when it's allocated).
Moreover, when available, the user data is used to contain also some data:
* Pointer to the next chunk
* Pointer to the previous chunk
* Size of the next chunk in the list
* Size of the previous chunk in the list
<figure><img src="../../.gitbook/assets/image (1243).png" alt=""><figcaption><p><a href="https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png">https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png</a></p></figcaption></figure>
2024-05-10 17:33:38 +00:00
{% hint style="info" %}
2024-05-09 18:03:32 +00:00
Note how liking the list this way prevents the need to having an array where every single chunk is being registered.
{% endhint %}
2024-05-10 17:33:38 +00:00
### Quick Heap Example
2024-05-09 18:03:32 +00:00
Quick heap example from [https://guyinatuxedo.github.io/25-heap/index.html](https://guyinatuxedo.github.io/25-heap/index.html) but in arm64:
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(void)
{
char *ptr;
ptr = malloc(0x10);
strcpy(ptr, "panda");
}
```
Set a breakpoint at the end of the main function and lets find out where the information was stored:
<figure><img src="../../.gitbook/assets/image (1239).png" alt=""><figcaption></figcaption></figure>
It's possible to see that the string panda was stored at `0xaaaaaaac12a0` (which was the address given as response by malloc inside `x0`). Checking 0x10 bytes before it's possible to see that the `0x0` represents that the **previous chunk is not used** (length 0) and that the length of this chunk is `0x21`.
The extra spaces reserved (0x21-0x10=0x11) comes from the **added headers** (0x10) and 0x1 doesn't mean that it was reserved 0x21B but the last 3 bits of the length of the current headed have the some special meanings. As the length is always 16-byte aligned (in 64bits machines), these bits are actually never going to be used by the length number.
```
0x1: Previous in Use - Specifies that the chunk before it in memory is in use
0x2: Is MMAPPED - Specifies that the chunk was obtained with mmap()
0x4: Non Main Arena - Specifies that the chunk was obtained from outside of the main arena
```
2024-05-10 17:33:38 +00:00
## Bins & Memory Allocations/Frees
Check what are the bins and how are they organized and how memory is allocated and freed in:
{% content-ref url="bins-and-memory-allocations.md" %}
[bins-and-memory-allocations.md](bins-and-memory-allocations.md)
{% endcontent-ref %}
## Heap Functions Security Checks
Functions involved in heap will perform certain check before performing its actions to try to make sure the heap wasn't corrupted:
{% content-ref url="heap-functions-security-checks.md" %}
[heap-functions-security-checks.md](heap-functions-security-checks.md)
{% endcontent-ref %}
2024-05-09 18:03:32 +00:00
## References
* [https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/](https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/)
* [https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/](https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/)