GITBOOK-4333: No subject

This commit is contained in:
CPol 2024-05-09 18:03:32 +00:00 committed by gitbook-bot
parent babb4b1e22
commit 81b7621325
No known key found for this signature in database
GPG key ID: 07D2180C7B12D0FF
7 changed files with 223 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 296 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View file

@ -1,2 +1,225 @@
# Heap
## Heap Basics
The heap is basically the place where a program is going to be able to store data when it requests data calling functions like **`malloc`**, `calloc`... Moreover, when this memory is no longer needed it's made available calling the function **`free`**.
As it's shown, its just after where the binary is being loaded in memory (check the `[heap]` section):
<figure><img src="../../.gitbook/assets/image (1241).png" alt=""><figcaption></figcaption></figure>
### Basic Chunk Allocation
When some data is requested to be stored in the heap, some space of the heap is allocated to it. This space will belong to a bin and only the requested data + the space of the bin headers + minimum bin size offset will be reserved for the chunk. The goal is to just reserve as minimum memory as possible without making it complicated to find where each chunk is. For this, the metadata chunk information is used to know where each used/free chunk is.
There are different ways to reserver the space mainly depending on the used bin, but a general methodology is the following:
* The program starts by requesting certain amount of memory.
* If in the list of chunks there someone available big enough to fulfil the request, it'll be used
* This might even mean that part of the available chunk will be used for this request and the rest will be added to the chunks list
* If there isn't any available chunk in the list but there is still space in allocated heap memory, the heap manager creates a new chunk
* If there is not enough heap space to allocate the new chunk, the heap manager asks the kernel to expand the memory allocated to the heap and then use this memory to generate the new chunk
* If everything fails, `malloc` returns null.
Note that if the requested **memory passes a threshold**, **`mmap`** will be used to map the requested memory.
### Arenas
In **multithreaded** applications, the heap manager must prevent **race conditions** that could lead to crashes. Initially, this was done using a **global mutex** to ensure that only one thread could access the heap at a time, but this caused **performance issues** due to the mutex-induced bottleneck.
To address this, the ptmalloc2 heap allocator introduced "arenas," where **each arena** acts as a **separate heap** with its **own** data **structures** and **mutex**, allowing multiple threads to perform heap operations without interfering with each other, as long as they use different arenas.
The default "main" arena handles heap operations for single-threaded applications. When **new threads** are added, the heap manager assigns them **secondary arenas** to reduce contention. It first attempts to attach each new thread to an unused arena, creating new ones if needed, up to a limit of 2 times the CPU cores for 32-bit systems and 8 times for 64-bit systems. Once the limit is reached, **threads must share arenas**, leading to potential contention.
Unlike the main arena, which expands using the `brk` system call, secondary arenas create "subheaps" using `mmap` and `mprotect` to simulate the heap behavior, allowing flexibility in managing memory for multithreaded operations.
### Subheaps
Subheaps serve as memory reserves for secondary arenas in multithreaded applications, allowing them to grow and manage their own heap regions separately from the main heap. Here's how subheaps differ from the initial heap and how they operate:
1. **Initial Heap vs. Subheaps**:
* The initial heap is located directly after the program's binary in memory, and it expands using the `sbrk` system call.
* Subheaps, used by secondary arenas, are created through `mmap`, a system call that maps a specified memory region.
2. **Memory Reservation with `mmap`**:
* When the heap manager creates a subheap, it reserves a large block of memory through `mmap`. This reservation doesn't allocate memory immediately; it simply designates a region that other system processes or allocations shouldn't use.
* By default, the reserved size for a subheap is 1 MB for 32-bit processes and 64 MB for 64-bit processes.
3. **Gradual Expansion with `mprotect`**:
* The reserved memory region is initially marked as `PROT_NONE`, indicating that the kernel doesn't need to allocate physical memory to this space yet.
* To "grow" the subheap, the heap manager uses `mprotect` to change page permissions from `PROT_NONE` to `PROT_READ | PROT_WRITE`, prompting the kernel to allocate physical memory to the previously reserved addresses. This step-by-step approach allows the subheap to expand as needed.
* Once the entire subheap is exhausted, the heap manager creates a new subheap to continue allocation.
### Metadata
As commented previously, these chunks also have some metadata, very good represented in this image:
<figure><img src="../../.gitbook/assets/image (1242).png" alt=""><figcaption><p><a href="https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png">https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png</a></p></figcaption></figure>
The metadata is usually 0x08B indicating the current chunk size using the last 3 bits to indicate:
* `A`: If 1 it comes from a subheap, if 0 it's in the main arena
* `M`: If 1, this chunk is part of a space allocated with mmap and not part of a heap
* `P`: If 1, the previous chunk is in use
Then, the space for the user data, and finally 0x08B to indicate the previous chunk size when the chunk is available (or to store user data when it's allocated).
Moreover, when available, the user data is used to contain also some data:
* Pointer to the next chunk
* Pointer to the previous chunk
* Size of the next chunk in the list
* Size of the previous chunk in the list
<figure><img src="../../.gitbook/assets/image (1243).png" alt=""><figcaption><p><a href="https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png">https://azeria-labs.com/wp-content/uploads/2019/03/chunk-allocated-CS.png</a></p></figcaption></figure>
Note how liking the list this way prevents the need to having an array where every single chunk is being registered.
## Free Protections
In order to protect from the accidental or intended abuse of the free function, before executing it's actions it perform some checks:
* It checks that the address [is aligned](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l4182) on an 8-byte or 16-byte on 64-bit boundary (`(address % 16) == 0`), since _malloc_ ensures all allocations are aligned.
* It checks that the chunks size field isnt impossibleeither because it is [too small](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l4318), too large, not an aligned size, or [would overlap the end of the process address space](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l4175).
* It checks that the chunk lies [within the boundaries of the arena](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l4318).
* It checks that the chunk is [not already marked as free](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l4182) by checking the corresponding “P” bit that lies in the metadata at the start of the next chunk.
## Bins
In order to improve the efficiency on how chunks are stored every chunk is not just in one linked list, but there are several types. These are the bins and there are 5 type of bins: [62](https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=6e766d11bc85b6480fa5c9f2a76559f8acf9deb5;hb=HEAD#l1407) small bins, 63 large bins, 1 unsorted bin, 10 fast bins and 64 tcache bins per thread.
The initial address to each unsorted, small and large bins is inside the same array. The index 0 is unused, 1 is the unsorted bin, bins 2-64 are small bins and bins 65-127 are large bins.
### Small Bins
Small bins are faster than large bins but slower than fast bins.
Each bin of the 62 will have **chunks of the same size**: 16, 24, ... (with a max size of 504 bytes in 32bits and 1024 in 64bits). This helps in the speed on finding the bin where a space should be allocated and inserting and removing of entries on these lists.
### Large bins
Unlike small bins, which manage chunks of fixed sizes, each **large bin handle a range of chunk sizes**. This is more flexible, allowing the system to accommodate **various sizes** without needing a separate bin for each size.
In a memory allocator, large bins start where small bins end. The ranges for large bins grow progressively larger, meaning the first bin might cover chunks from 512 to 576 bytes, while the next covers 576 to 640 bytes. This pattern continues, with the largest bin containing all chunks above 1MB.
Large bins are slower to operate compared to small bins because they must **sort and search through a list of varying chunk sizes to find the best fit** for an allocation. When a chunk is inserted into a large bin, it has to be sorted, and when memory is allocated, the system must find the right chunk. This extra work makes them **slower**, but since large allocations are less common than small ones, it's an acceptable trade-off.
There are:
* 32 bins of 64B range
* 16 bins of 512B range
* 8bins of 4096B range
* 4bins of 32768B range
* 2bins of 262144B range
* 1bins of for reminding sizes
### Unsorted bin
The unsorted bin is a **fast cache** used by the heap manager to make memory allocation quicker. Here's how it works: When a program frees memory, the heap manager doesn't immediately put it in a specific bin. Instead, it first tries to **merge it with any neighbouring free chunks** to create a larger block of free memory. Then, it places this new chunk in a general bin called the "unsorted bin."
When a program **asks for memory**, the heap manager **first checks the unsorted bin** to see if there's a chunk of the right size. If it finds one, it uses it right away, which is faster than searching through other bins. If it doesn't find a suitable chunk, it moves the freed chunks to their correct bins, either small or large, based on their size.
So, the unsorted bin is a way to speed up memory allocation by quickly reusing recently freed memory and reducing the need for time-consuming searches and merges.
{% hint style="danger" %}
Note that even in chunks are of different categories, from time to time, if an available chunk is colliding with another available chunk (even if they are of different categories), they will be merged.
{% endhint %}
### Fast bins
Fast bins are designed to **speed up memory allocation for small chunks** by keeping recently freed chunks in a quick-access structure. These bins use a Last-In, First-Out (LIFO) approach, which means that the **most recently freed chunk is the first** to be reused when there's a new allocation request. This behavior is advantageous for speed, as it's faster to insert and remove from the top of a stack (LIFO) compared to a queue (FIFO).
Additionally, **fast bins use singly linked lists**, not double linked, which further improves speed. Since chunks in fast bins aren't merged with neighbours, there's no need for a complex structure that allows removal from the middle. A singly linked list is simpler and quicker for these operations.
Basically, what happens here is that the header (the pointer to the first chunk to check) is always pointing to the latest freed chunk of that size. So:
* When a new chunk is allocated of that size, the header is pointing to a free chunk to use. As this free chunk is pointing to the next one to use, this address is stored in the header so the next allocation knows where to get ana available chunk
* When a chunk is freed, the free chunk will save the address to the current available chunk and the address to this newly freed chunk will be pu in the header
{% hint style="danger" %}
Chunks in fast bins aren't automatically set as available so they keep as fast bin chunks for some time instead of being able to merge with other chunks.
{% endhint %}
### Tcache (Per-Thread Cache) Bins
Even though threads try to have their own heap (see [Arenas](./#arenas) and [Subheaps](./#subheaps)), there is the possibility that a process with a lot of threads (like a web server) **will end sharing the heap with another threads**. In this case, the main solution is the use of **lockers**, which might **slow down significantly the threads**.
Therefore, a tcache is similar to a fast bin per thread in the way that it's a **single linked list** that doesn't merge chunks. Each thread has **64 singly-linked tcache bins**. Each bin can have a maximum of [7 same-size chunks](https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=2527e2504761744df2bdb1abdc02d936ff907ad2;hb=d5c3fafc4307c9b7a4c7d5cb381fcdbfad340bcc#l323) ranging from [24 to 1032B on 64-bit systems and 12 to 516B on 32-bit systems](https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=2527e2504761744df2bdb1abdc02d936ff907ad2;hb=d5c3fafc4307c9b7a4c7d5cb381fcdbfad340bcc#l315).
**When a thread frees** a chunk, **if it isn't too big** to be allocated in the tcache and the respective tcache bin **isn't full** (already 7 chunks), **it'll be allocated in there**. If it cannot go to the tcache, it'll need to wait for the heap lock to be able to perform the free operation globally.
When a **chunk is allocated**, if there is a free chunk of the needed size in the **Tcache it'll use it**, if not, it'll need to wait for the heap lock to be able to find one in the global bins or create a new one.\
There also an optimization, in this case, while having the heap lock, the thread **will fill his Tcache with heap chunks (7) of the requested size**, so if case it needs more, it'll find them in Tcache.
### Bins order
#### For allocating:
1. If available chunk in Tcache of that size, use Tcache
2. It super big, use mmap
3. Obtain the arena heap lock and:
1. If enough small size, fast bin chunk available of the requested size, use it and prefill the tcache from the fast bin
2. Check each entry in the unsorted list searching for one chunk big enough, and prefill the tcache if possible
3. Check the small bins or large bins (according to the requested size) and prefill the tcache if possible
4. Create a new chunk from available memory
1. If there isn't available memory, get more using `sbrk`
2. If the main heap memory can't grow more, create a new space using mmap
5. If nothing worked, return null
**For freeing:**
1. If the pointer is Null, finish
2. Perform `free` sanity checks in the chunk to try to verify it's a legit chunk
1. If small enough and tcache not full, put it there
2. If the bit M is set (not heap), use `munmap`
3. Get arena heap lock:
1. If it fits in a fastbin, put it there
2. If the chunk is > 64KB, consolidate the fastbins immediately and put the resulting merged chunks on the unsorted bin.
3. Merge the chunk backwards and forwards with neighboring freed chunks in the small, large, and unsorted bins if any.
4. If in the top of the head, merge it into the unused memory
5. If not the previous ones, store it in the unsorted list
\
Quick heap example from [https://guyinatuxedo.github.io/25-heap/index.html](https://guyinatuxedo.github.io/25-heap/index.html) but in arm64:
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(void)
{
char *ptr;
ptr = malloc(0x10);
strcpy(ptr, "panda");
}
```
Set a breakpoint at the end of the main function and lets find out where the information was stored:
<figure><img src="../../.gitbook/assets/image (1239).png" alt=""><figcaption></figcaption></figure>
It's possible to see that the string panda was stored at `0xaaaaaaac12a0` (which was the address given as response by malloc inside `x0`). Checking 0x10 bytes before it's possible to see that the `0x0` represents that the **previous chunk is not used** (length 0) and that the length of this chunk is `0x21`.
The extra spaces reserved (0x21-0x10=0x11) comes from the **added headers** (0x10) and 0x1 doesn't mean that it was reserved 0x21B but the last 3 bits of the length of the current headed have the some special meanings. As the length is always 16-byte aligned (in 64bits machines), these bits are actually never going to be used by the length number.
```
0x1: Previous in Use - Specifies that the chunk before it in memory is in use
0x2: Is MMAPPED - Specifies that the chunk was obtained with mmap()
0x4: Non Main Arena - Specifies that the chunk was obtained from outside of the main arena
```
##
## References
* [https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/](https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/)
* [https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/](https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/)