mirror of https://github.com/carlospolop/hacktricks synced 2024-12-19 17:44:47 +00:00

Carlos Polop 6be43c15e9 Recreating repository history

2024-12-12 11:39:29 +01:00

36 KiB

Raw Blame History

Bins & Memory Allocations

{% hint style="success" %} Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)

Support HackTricks

Check the subscription plans!
Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.

{% endhint %}

Basic Information

In order to improve the efficiency on how chunks are stored every chunk is not just in one linked list, but there are several types. These are the bins and there are 5 type of bins: 62 small bins, 63 large bins, 1 unsorted bin, 10 fast bins and 64 tcache bins per thread.

The initial address to each unsorted, small and large bins is inside the same array. The index 0 is unused, 1 is the unsorted bin, bins 2-64 are small bins and bins 65-127 are large bins.

Tcache (Per-Thread Cache) Bins

Even though threads try to have their own heap (see Arenas and Subheaps), there is the possibility that a process with a lot of threads (like a web server) will end sharing the heap with another threads. In this case, the main solution is the use of lockers, which might slow down significantly the threads.

Therefore, a tcache is similar to a fast bin per thread in the way that it's a single linked list that doesn't merge chunks. Each thread has 64 singly-linked tcache bins. Each bin can have a maximum of 7 same-size chunks ranging from 24 to 1032B on 64-bit systems and 12 to 516B on 32-bit systems.

When a thread frees a chunk, if it isn't too big to be allocated in the tcache and the respective tcache bin isn't full (already 7 chunks), it'll be allocated in there. If it cannot go to the tcache, it'll need to wait for the heap lock to be able to perform the free operation globally.

When a chunk is allocated, if there is a free chunk of the needed size in the Tcache it'll use it, if not, it'll need to wait for the heap lock to be able to find one in the global bins or create a new one.
There's also an optimization, in this case, while having the heap lock, the thread will fill his Tcache with heap chunks (7) of the requested size, so in case it needs more, it'll find them in Tcache.

Add a tcache chunk example

#include <stdlib.h>
#include <stdio.h> 

int main(void) 
{
  char *chunk;
  chunk = malloc(24);
  printf("Address of the chunk: %p\n", (void *)chunk);
  gets(chunk);
  free(chunk);
  return 0;
}

Compile it and debug it with a breakpoint in the ret opcode from main function. then with gef you can see the tcache bin in use:

gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=0, size=0x20, count=1] ←  Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)

Tcache Structs & Functions

In the following code it's possible to see the max bins and chunks per index, the tcache_entry struct created to avoid double frees and tcache_perthread_struct, a struct that each thread uses to store the addresses to each index of the bin.

tcache_entry and tcache_perthread_struct

// From https://github.com/bminor/glibc/blob/f942a732d37a96217ef828116ebe64a644db18d7/malloc/malloc.c

/* We want 64 entries.  This is an arbitrary limit, which tunables can reduce.  */
# define TCACHE_MAX_BINS		64
# define MAX_TCACHE_SIZE	tidx2usize (TCACHE_MAX_BINS-1)

/* Only used to pre-fill the tunables.  */
# define tidx2usize(idx)	(((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

/* When "x" is from chunksize().  */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size.  */
# define usize2tidx(x) csize2tidx (request2size (x))

/* With rounding and alignment, the bins are...
   idx 0   bytes 0..24 (64-bit) or 0..12 (32-bit)
   idx 1   bytes 25..40 or 13..20
   idx 2   bytes 41..56 or 21..28
   etc.  */

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7

/* Maximum chunks in tcache bins for tunables.  This value must fit the range
   of tcache->counts[] entries, else they may overflow.  */
# define MAX_TCACHE_COUNT UINT16_MAX

[...]

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  uintptr_t key;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

The function __tcache_init is the function that creates and allocates the space for the tcache_perthread_struct obj

tcache_init code

// From https://github.com/bminor/glibc/blob/f942a732d37a96217ef828116ebe64a644db18d7/malloc/malloc.c#L3241C1-L3274C2

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }

}

Tcache Indexes

The tcache have several bins depending on the size an the initial pointers to the first chunk of each index and the amount of chunks per index are located inside a chunk. This means that locating the chunk with this information (usually the first), it's possible to find all the tcache initial points and the amount of Tcache chunks.

Fast bins

Fast bins are designed to speed up memory allocation for small chunks by keeping recently freed chunks in a quick-access structure. These bins use a Last-In, First-Out (LIFO) approach, which means that the most recently freed chunk is the first to be reused when there's a new allocation request. This behaviour is advantageous for speed, as it's faster to insert and remove from the top of a stack (LIFO) compared to a queue (FIFO).

Additionally, fast bins use singly linked lists, not double linked, which further improves speed. Since chunks in fast bins aren't merged with neighbours, there's no need for a complex structure that allows removal from the middle. A singly linked list is simpler and quicker for these operations.

Basically, what happens here is that the header (the pointer to the first chunk to check) is always pointing to the latest freed chunk of that size. So:

When a new chunk is allocated of that size, the header is pointing to a free chunk to use. As this free chunk is pointing to the next one to use, this address is stored in the header so the next allocation knows where to get an available chunk
When a chunk is freed, the free chunk will save the address to the current available chunk and the address to this newly freed chunk will be put in the header

The maximum size of a linked list is 0x80 and they are organized so a chunk of size 0x20 will be in index 0, a chunk of size 0x30 would be in index 1...

{% hint style="danger" %} Chunks in fast bins aren't set as available so they are keep as fast bin chunks for some time instead of being able to merge with other free chunks surrounding them. {% endhint %}

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

/*
   Fastbins

    An array of lists holding recently freed small chunks.  Fastbins
    are not doubly linked.  It is faster to single-link them, and
    since chunks are never removed from the middles of these lists,
    double linking is not necessary. Also, unlike regular bins, they
    are not even processed in FIFO order (they use faster LIFO) since
    ordering doesn't much matter in the transient contexts in which
    fastbins are normally used.

    Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks.
 */

typedef struct malloc_chunk *mfastbinptr;
#define fastbin(ar_ptr, idx) ((ar_ptr)->fastbinsY[idx])

/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)


/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)

Add a fastbin chunk example

#include <stdlib.h>
#include <stdio.h>

int main(void) 
{
  char *chunks[8]; 
  int i;

  // Loop to allocate memory 8 times
  for (i = 0; i < 8; i++) {
    chunks[i] = malloc(24);
    if (chunks[i] == NULL) { // Check if malloc failed
      fprintf(stderr, "Memory allocation failed at iteration %d\n", i);
      return 1;
    }
    printf("Address of chunk %d: %p\n", i, (void *)chunks[i]);
  }

  // Loop to free the allocated memory
  for (i = 0; i < 8; i++) {
    free(chunks[i]);
  }

  return 0;
}

Note how we allocate and free 8 chunks of the same size so they fill the tcache and the eight one is stored in the fast chunk.

Compile it and debug it with a breakpoint in the ret opcode from main function. then with gef you can see that the tcache bin is full and one chunk is in the fast bin:

gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=0, size=0x20, count=7] ←  Chunk(addr=0xaaaaaaac1770, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1750, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1730, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1710, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac16f0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac16d0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA) 
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20]  ←  Chunk(addr=0xaaaaaaac1790, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA) 
Fastbins[idx=1, size=0x30] 0x00

Unsorted bin

The unsorted bin is a cache used by the heap manager to make memory allocation quicker. Here's how it works: When a program frees a chunk, and if this chunk cannot be allocated in a tcache or fast bin and is not colliding with the top chunk, the heap manager doesn't immediately put it in a specific small or large bin. Instead, it first tries to merge it with any neighbouring free chunks to create a larger block of free memory. Then, it places this new chunk in a general bin called the "unsorted bin."

When a program asks for memory, the heap manager checks the unsorted bin to see if there's a chunk of enough size. If it finds one, it uses it right away. If it doesn't find a suitable chunk in the unsorted bin, it moves all the chunks in this list to their corresponding bins, either small or large, based on their size.

Note that if a larger chunk is split in 2 halves and the rest is larger than MINSIZE, it'll be paced back into the unsorted bin.

So, the unsorted bin is a way to speed up memory allocation by quickly reusing recently freed memory and reducing the need for time-consuming searches and merges.

{% hint style="danger" %} Note that even if chunks are of different categories, if an available chunk is colliding with another available chunk (even if they belong originally to different bins), they will be merged. {% endhint %}

Add a unsorted chunk example

#include <stdlib.h>
#include <stdio.h>

int main(void) 
{
  char *chunks[9]; 
  int i;

  // Loop to allocate memory 8 times
  for (i = 0; i < 9; i++) {
    chunks[i] = malloc(0x100);
    if (chunks[i] == NULL) { // Check if malloc failed
      fprintf(stderr, "Memory allocation failed at iteration %d\n", i);
      return 1;
    }
    printf("Address of chunk %d: %p\n", i, (void *)chunks[i]);
  }

  // Loop to free the allocated memory
  for (i = 0; i < 8; i++) {
    free(chunks[i]);
  }

  return 0;
}

Note how we allocate and free 9 chunks of the same size so they fill the tcache and the eight one is stored in the unsorted bin because it's too big for the fastbin and the nineth one isn't freed so the nineth and the eighth don't get merged with the top chunk.

Compile it and debug it with a breakpoint in the ret opcode from main function. Then with gef you can see that the tcache bin is full and one chunk is in the unsorted bin:

gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=15, size=0x110, count=7] ←  Chunk(addr=0xaaaaaaac1d10, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1c00, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1af0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac19e0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac18d0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac17c0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA) 
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] unsorted_bins[0]: fw=0xaaaaaaac1e10, bk=0xaaaaaaac1e10
 →   Chunk(addr=0xaaaaaaac1e20, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in unsorted bin.

Small Bins

Small bins are faster than large bins but slower than fast bins.

Each bin of the 62 will have chunks of the same size: 16, 24, ... (with a max size of 504 bytes in 32bits and 1024 in 64bits). This helps in the speed on finding the bin where a space should be allocated and inserting and removing of entries on these lists.

This is how the size of the small bin is calculated according to the index of the bin:

Smallest size: 2*4*index (e.g. index 5 -> 40)
Biggest size: 2*8*index (e.g. index 5 -> 80)

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > CHUNK_HDR_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

#define smallbin_index(sz) \
  ((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
   + SMALLBIN_CORRECTION)

Function to choose between small and large bins:

#define bin_index(sz) \
  ((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))

Add a small chunk example

#include <stdlib.h>
#include <stdio.h>

int main(void) 
{
  char *chunks[10]; 
  int i;

  // Loop to allocate memory 8 times
  for (i = 0; i < 9; i++) {
    chunks[i] = malloc(0x100);
    if (chunks[i] == NULL) { // Check if malloc failed
      fprintf(stderr, "Memory allocation failed at iteration %d\n", i);
      return 1;
    }
    printf("Address of chunk %d: %p\n", i, (void *)chunks[i]);
  }

  // Loop to free the allocated memory
  for (i = 0; i < 8; i++) {
    free(chunks[i]);
  }
  
  chunks[9] = malloc(0x110);

  return 0;
}

Note how we allocate and free 9 chunks of the same size so they fill the tcache and the eight one is stored in the unsorted bin because it's too big for the fastbin and the ninth one isn't freed so the ninth and the eights don't get merged with the top chunk. Then we allocate a bigger chunk of 0x110 which makes the chunk in the unsorted bin goes to the small bin.

Compile it and debug it with a breakpoint in the ret opcode from main function. then with gef you can see that the tcache bin is full and one chunk is in the small bin:

gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=15, size=0x110, count=7] ←  Chunk(addr=0xaaaaaaac1d10, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1c00, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1af0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac19e0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac18d0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac17c0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA) 
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in unsorted bin.
──────────────────────────────────────────────────────────────────────── Small Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] small_bins[16]: fw=0xaaaaaaac1e10, bk=0xaaaaaaac1e10
 →   Chunk(addr=0xaaaaaaac1e20, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in 1 small non-empty bins.

Large bins

Unlike small bins, which manage chunks of fixed sizes, each large bin handle a range of chunk sizes. This is more flexible, allowing the system to accommodate various sizes without needing a separate bin for each size.

In a memory allocator, large bins start where small bins end. The ranges for large bins grow progressively larger, meaning the first bin might cover chunks from 512 to 576 bytes, while the next covers 576 to 640 bytes. This pattern continues, with the largest bin containing all chunks above 1MB.

Large bins are slower to operate compared to small bins because they must sort and search through a list of varying chunk sizes to find the best fit for an allocation. When a chunk is inserted into a large bin, it has to be sorted, and when memory is allocated, the system must find the right chunk. This extra work makes them slower, but since large allocations are less common than small ones, it's an acceptable trade-off.

There are:

32 bins of 64B range (collide with small bins)
16 bins of 512B range (collide with small bins)
8bins of 4096B range (part collide with small bins)
4bins of 32768B range
2bins of 262144B range
1bin for remaining sizes

Large bin sizes code

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

#define largebin_index_32(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 38) ?  56 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index_32_big(sz)                                            \
  (((((unsigned long) (sz)) >> 6) <= 45) ?  49 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

// XXX It remains to be seen whether it is good to keep the widths of
// XXX the buckets the same or whether it should be scaled by a factor
// XXX of two as well.
#define largebin_index_64(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 48) ?  48 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index(sz) \
  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
   : largebin_index_32 (sz))

Add a large chunk example

#include <stdlib.h>
#include <stdio.h>

int main(void) 
{
  char *chunks[2]; 

  chunks[0] = malloc(0x1500);
  chunks[1] = malloc(0x1500);
  free(chunks[0]);
  chunks[0] = malloc(0x2000);

  return 0;
}

2 large allocations are performed, then on is freed (putting it in the unsorted bin) and a bigger allocation in made (moving the free one from the usorted bin ro the large bin).

Compile it and debug it with a breakpoint in the ret opcode from main function. then with gef you can see that the tcache bin is full and one chunk is in the large bin:

gef➤  heap bin
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
All tcachebins are empty
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in unsorted bin.
──────────────────────────────────────────────────────────────────────── Small Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in 0 small non-empty bins.
──────────────────────────────────────────────────────────────────────── Large Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] large_bins[100]: fw=0xaaaaaaac1290, bk=0xaaaaaaac1290
 →   Chunk(addr=0xaaaaaaac12a0, size=0x1510, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in 1 large non-empty bins.

Top Chunk

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

/*
   Top

    The top-most available chunk (i.e., the one bordering the end of
    available memory) is treated specially. It is never included in
    any bin, is used only if no other chunk is available, and is
    released back to the system if it is very large (see
    M_TRIM_THRESHOLD).  Because top initially
    points to its own bin with initial zero size, thus forcing
    extension on the first malloc request, we avoid having any special
    code in malloc to check whether it even exists yet. But we still
    need to do so when getting memory from system, so we make
    initial_top treat the bin as a legal but unusable chunk during the
    interval between initialization and the first call to
    sysmalloc. (This is somewhat delicate, since it relies on
    the 2 preceding words to be zero during this interval as well.)
 */

/* Conveniently, the unsorted bin can be used as dummy top on first call */
#define initial_top(M)              (unsorted_chunks (M))

Basically, this is a chunk containing all the currently available heap. When a malloc is performed, if there isn't any available free chunk to use, this top chunk will be reducing its size giving the necessary space.
The pointer to the Top Chunk is stored in the malloc_state struct.

Moreover, at the beginning, it's possible to use the unsorted chunk as the top chunk.

Observe the Top Chunk example

#include <stdlib.h>
#include <stdio.h> 

int main(void) 
{
  char *chunk;
  chunk = malloc(24);
  printf("Address of the chunk: %p\n", (void *)chunk);
  gets(chunk);
  return 0;
}

After compiling and debugging it with a break point in the ret opcode of main I saw that the malloc returned the address 0xaaaaaaac12a0 and these are the chunks:

gef➤  heap chunks
Chunk(addr=0xaaaaaaac1010, size=0x290, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
    [0x0000aaaaaaac1010     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................]
Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
    [0x0000aaaaaaac12a0     41 41 41 41 41 41 41 00 00 00 00 00 00 00 00 00    AAAAAAA.........]
Chunk(addr=0xaaaaaaac12c0, size=0x410, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
    [0x0000aaaaaaac12c0     41 64 64 72 65 73 73 20 6f 66 20 74 68 65 20 63    Address of the c]
Chunk(addr=0xaaaaaaac16d0, size=0x410, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
    [0x0000aaaaaaac16d0     41 41 41 41 41 41 41 0a 00 00 00 00 00 00 00 00    AAAAAAA.........]
Chunk(addr=0xaaaaaaac1ae0, size=0x20530, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  top chunk

Where it can be seen that the top chunk is at address 0xaaaaaaac1ae0. This is no surprise because the last allocated chunk was in 0xaaaaaaac12a0 with a size of 0x410 and 0xaaaaaaac12a0 + 0x410 = 0xaaaaaaac1ae0 .
It's also possible to see the length of the Top chunk on its chunk header:

gef➤  x/8wx 0xaaaaaaac1ae0 - 16
0xaaaaaaac1ad0:	0x00000000	0x00000000	0x00020531	0x00000000
0xaaaaaaac1ae0:	0x00000000	0x00000000	0x00000000	0x00000000

Last Remainder

When malloc is used and a chunk is divided (from the unsorted bin or from the top chunk for example), the chunk created from the rest of the divided chunk is called Last Remainder and it's pointer is stored in the malloc_state struct.

Allocation Flow

Check out:

{% content-ref url="heap-memory-functions/malloc-and-sysmalloc.md" %} malloc-and-sysmalloc.md {% endcontent-ref %}

Free Flow