Introduction to HEAP

Introduction

In previous articles, we explored memory corruption vulnerabilities that rely on a solid understanding of the stack, particularly how it works and how it can be exploited to control a program's execution flow.

However, the stack is not the only memory space vulnerable to exploitation. In fact, heap corruption vulnerabilities are more common.

Today, we'll dive into the fundamentals of the heap. I'll simplify this complex topic as much as possible to make it accessible to beginners. In future articles, we’ll explore it in greater depth.

What is the heap?

The heap is a region of memory used for dynamic allocation. Unlike the stack, which follows a strict Last In, First Out (LIFO) order, the heap offers more flexibility. Programmers can request and release memory at any time during execution. This makes it ideal for situations where the required data size is not known beforehand.

A program allocates memory on the heap using functions like malloc in C or new in C++. This memory stays allocated until it is explicitly freed, which can cause problems like memory leaks if not handled correctly. Modern languages, such as Java, manage this automatically with a Garbage Collector system.

The heap can grow as needed, but its management is more complex, often involving the operating system's memory allocator to keep track of free and used memory blocks.

Heap vs. Stack

The heap and the stack are two distinct memory regions used by programs, each serving different purposes and characteristics.

The stack is a memory region that operates on a Last In, First Out (LIFO) principle. It is primarily used for managing function calls and local variables. When a function is called, a block of memory is allocated on the stack for its parameters and local variables (known as the stack frame or call stack), which are automatically deallocated when the function exits. This automatic management makes stack memory access very fast and efficient, but it is limited in size and scope.

In contrast, the heap is used for dynamic memory allocation. Memory in the heap can be allocated and freed at any time during program execution, allowing for more flexible data structures like linked lists and trees. However, this flexibility comes with increased complexity in memory management. Programmers are responsible for manually allocating and deallocating memory, which can lead to issues like memory leaks and fragmentation if not handled carefully.

Note

The heap manager mechanism depends on the operating system, as its implementation can vary between platforms. While the core concepts are generally similar, certain specifics may differ. In this article, we’ll focus on the Linux implementation of the heap.

We’ll also explore an older heap implementation called dlmalloc, which provides an excellent foundation for understanding this topic. Although dlmalloc is no longer the standard, modern heap implementations build upon its core concepts. In upcoming articles, we’ll introduce modern mechanisms, highlighting how they expand upon the principles established by dlmalloc.

Understanding Chunks in the Heap

Chunks are the individual blocks of memory allocated and managed by the memory allocator. When a program requests heap memory, the allocator divides it into chunks. Each chunk can store different amounts of data depending on the allocation request.

Each chunk typically contains a header that stores metadata about the allocation, such as its size and status of the previous chunk (allocated or free). This metadata is crucial for the memory allocator to manage memory effectively, allowing it to track which chunks are in use and which are available for future allocations.

Chunks can vary in size and can be split or merged depending on the allocation and deallocation patterns. When a chunk is freed, the memory allocator can combine it with adjacent free chunks to create larger blocks of free memory. This process is known as coalescing.

Allocated chunks structure

A chunk consists of two parts: the header and the user-accessible data. The header stores metadata, such as the chunk's size and other details. In this article, these parts will be referred to as metadata and user data.

Metadata

In a chunk, the metadata stores crucial information about the chunk.

prev_size: The prev_size field indicates the size of the previous chunk in memory. It is only set if the previous chunk is free; otherwise, it is part of the user data for that chunk.

Size: The size field represents the size of the current chunk, including the metadata itself. It must be a multiple of 8 (or 16 on 64-bit platforms), which means the last 3 bits of the size are 0.

As you may have noticed in the picture above, there are 3 letters within the size field. These 3 pieces of information occupy the last 3 bits and describe important aspects for the memory allocator:

P is the PREV_INUSE flag. The least significant bit (LSB) of the size field indicates whether the previous chunk (the chunk ahead) is in use or not (i.e., whether it is free or currently being used by the programmer).

M is the IS_MMAPPED flag, which is set when the chunk is allocated via mmap() rather than through a heap mechanism such as malloc(). Details about this flag will be covered in the next article.

A is the NON_MAIN_ARENA flag, which indicates whether a chunk belongs to the main arena or a non-main arena in multi-threaded applications. As this article focuses on single-threaded HEAPs, we will explore this topic in future discussions.

Userdata

This section of the chunk is accessible to the programmer, where the actual data resides. It is the section that the programmer can access and manipulate.

How `malloc()` Works

The malloc() function dynamically allocates memory on the heap and returns a void* pointer to the user data section of the allocated block. If the allocation fails due to insufficient memory, it returns NULL. To use the returned pointer with a specific data type, it must be cast accordingly.

Memory allocated by malloc() is properly aligned for any object requiring alignment up to the fundamental alignment, typically 8 bytes on most platforms. On 64-bit systems, the alignment requirement increases to 16 bytes. If the requested memory size is not a multiple of the alignment, malloc() pads the allocation to ensure proper alignment. For example:

A request for 12 bytes results in a 16-byte allocation (plus 16 bytes for metadata, totaling 32 bytes).
A request for 24 bytes also results in 32 bytes being allocated.

Even when a size of 0 is requested, malloc() returns a valid pointer, but not to a zero-length block. Instead, it allocates the minimum size for a chunk, which is 32 bytes (16 bytes for user data and 16 bytes for metadata) on 64-bit systems. Regardless of the requested size, it's important to always check the return value of malloc() to verify that memory allocation was successful.

The malloc() function may allocate more memory than requested due to alignment requirements and metadata storage. The allocation process (simplified) involves the following steps:

Reusing Freed Chunks: When memory is freed, it becomes a free chunk. These chunks are tracked by the heap manager for future reuse. If a previously freed chunk is large enough to satisfy a new allocation request, it is selected and reused. This avoids allocating new memory from the heap unnecessarily.(We’ll explore free chunks in more detail later.)
Allocating from the Top Chunk: If no suitable free chunk is available, the allocation occurs in the top chunk, which is the unallocated memory region at the end of the heap. The requested size is carved out of the top chunk, reducing its size.
Requesting More Memory: If the top chunk is too small to fulfill the request, the heap manager asks the kernel for additional memory to extend the heap. This newly allocated space is appended to the top chunk, making it available for allocation.
Handling Allocation Failure: If none of these steps succeed—either because the heap cannot grow further or no suitable chunks are available—malloc() returns NULL, indicating the allocation has failed.

Practice example

In order to understand how the heap works, I'll use this program from Exploit Education. Today, we won't be exploiting this program, but it's a perfect starting point for learning about the heap. We'll focus on the 64-bit platform implementation of the heap.

Here's the source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>


struct heapStructure {
  int priority;
  char *name;
};

int main(int argc, char **argv) {
  struct heapStructure *i1, *i2;

  i1 = malloc(sizeof(struct heapStructure));
  i1->priority = 1;
  i1->name = malloc(8);

  i2 = malloc(sizeof(struct heapStructure));
  i2->priority = 2;
  i2->name = malloc(8);

  strcpy(i1->name, argv[1]);
  strcpy(i2->name, argv[2]);

  printf("and that's a wrap folks!\n");
}

void winner() {
  printf(
      "Congratulations, you've completed this level @ %ld seconds past the "
      "Epoch\n",
      time(NULL));
}

Let’s break down this code.

Firstly, the code defines a structure called heapStructure, which contains two variables: priority (a 4-byte integer) and name (a character pointer, which occupies 8 bytes). Then, it creates two pointers, i1 and i2, that will point to structures of type heapStructure.

Think of the heap as shown in the image below:

Let’s explain this image so you can understand how to populate the heap. On the left are the addresses of each 'row' (of course, the heap doesn’t look like this, but it’s just for illustration). As you can see, above the HEAP box, there are the values you need to add to the addresses on the left to reach each field of the heap (0, 8, 16, 24). This heap represents a 64-bit architecture, so each 'field' is 8 bytes. That’s why we add 8 bytes for each field.

For example, imagine you want to access the blue-colored field below:

This means you need to take the address on the left of the row, 0x5f11cd9372b0, and add 16 bytes (0x10 in hex). The resulting address will be 0x5f11cd9372c0, which is the address of the beginning of that field.

That said, the heap is currently empty.

To allocate the size of the structure on the heap, the code uses malloc and provides the structure size as a parameter. In this case, the total size is 12 bytes, since the priority variable is an int (which occupies 4 bytes in memory) and the name pointer occupies 8 bytes in a 64-bit architecture. However, the compiler will add 4 padding bytes after the int to correctly align the 8 bytes, so when you execute sizeof(struct heapStructure), the program actually returns 16 bytes. Therefore, a 16-byte chunk will be requested.

However, remember that the chunk contains its own metadata, so it will occupy an additional 16 bytes for the size (an 8-byte number) and prev_size (another 8-byte number).

Once the first malloc is executed, the heap will look like this (I'll use a different color to represent each allocated chunk to make it easier to understand):

As observed, a 32-byte chunk has been allocated. The prev_size field is empty (we'll ignore it for now), while the size field indicates the total size of the chunk, which is 0x20 (32 in hexadecimal). You might have noticed that it shows 0x21 instead of 0x20. This discrepancy occurs because the PREV_INUSE bit is set. Since this is the first chunk available in the heap, the memory preceding it is allocated for other purposes, meaning it is in use.

Following the size field, the user data section begins, which contains the priority integer and the char pointer defined in the heapStructure structure. These fields are empty since no data has been assigned to them yet.

Following the first malloc call, the program sets the priority field of the structure pointed to by i1 to the value 1.

i1->priority = 1;

Structures in C are organized in contiguous memory blocks, where each member has a specific offset from the beginning of the structure. For example, as the priority member is defined first within the structure, its offset is zero. If were is defined after one or more other members, its offset would be the sum of the sizes of the preceding members. The compiler uses this information to determine where each member is located in memory relative to the base address of the structure.

When the code uses the -> operator, it dereferences the pointer i1 to access the structure it points to. This operator is effectively a combination of two actions: it retrieves the memory address from i1 and calculates the address of the priority member by adding its offset to that base address. Once the correct memory location for priority is determined, the assignment operation takes place, where the value 1 is written to that address. This involves storing the binary representation of the integer 1 in memory, effectively updating the priority field of the structure.

So, if the memory address pointed to by i1 (the start of the user data section) were 0x5f11cd9372a0, and the priority member were second in the structure instead of first, the program would perform a mathematical operation by adding the size of the char pointer member (8 bytes) to the address pointed to by i1. The operation would be as follows:

0x5f11cd9372a0 + 0x8 = 0x5f11cd9372a8 (priority address)

In this case, since the priority member is first in the structure, the operation will actually be as follows:

0x5f11cd9372a0 + 0x0 = 0x5f11cd9372a0 (priority address)

After performing that operation, the program assigns the value 1 to the resulting address. Then, the heap will look like this:

As observed, the first user data field, corresponding to the priority member, has been populated with the value 1.

On the next line, the program calls malloc again and assigns the return value to the name member of i1:

i1->name = malloc(8);

Essentially, the program allocates a new 8-byte chunk and stores the returned address—corresponding to the beginning of the user data section of the newly allocated chunk—in the name member of the structure pointed to by i1.

So, the heap will look like this:

As observed, a new chunk was allocated (which I've highlighted in a different color), and the name member now contains the address of the beginning of the user data portion of the newly allocated chunk.

As you may have noticed, malloc() allocated a new block with the same size as the previous one, even though we specified 8 bytes less. This is due to the need for the chunks to be correctly aligned (16 bytes), as we explained when we introduced malloc. The PREV_INUSE bit is also set because the previous chunk is allocated.

Subsequently, the program calls malloc again, but this time stores the returned address into the i2 pointer:

i2 = malloc(sizeof(struct heapStructure));

This line performs the same action as the program did on the first call to malloc. Therefore, a new chunk is allocated:

After the last malloc call, the heap appears as follows:

Up to this point, there are four allocated chunks. This was a simple demonstration of how heap chunks are allocated and how we can interact with them.

We've seen that when we request an 8-byte chunk, the program actually allocates 16 bytes (including metadata). Similarly, if you request a 24-byte chunk (which is not a multiple of 16), it will also allocate 32 bytes in total—16 bytes for the user data and an additional 16 bytes for metadata. To illustrate this behavior, let's analyze the following program:

#include <stdlib.h>
#include <string.h>


int main(){

    char *exampleString = "AAAAAAAAAAAAAAAAAAAAAAAA";
    char *myChunk = malloc(24);
    void *myChunk2 = malloc(16);
        
    strncpy(myChunk, exampleString, 24);
    
    return 0; 
}

As observed, it first defines a char pointer that points to a string in memory with a length of 24 characters. Then, it calls malloc to request a 24-byte chunk and stores the returned address in the myChunk pointer. Next, it calls malloc again to request a second chunk, which we won't use; this is just to demonstrate that the prev_size field of this chunk will be utilized as part of the previous chunk.

Finally, it calls strncpy to copy 24 bytes of the string pointed to by exampleString to the address stored in myChunk. If I were to call strcpy, it would copy 25 bytes instead of 24, as it also includes the null byte at the end of the string.

Initially, after the two calls to malloc, the heap appears as follows:

Let's see what happens when we call strcpy:

As observed, the prev_size field of the second chunk is utilized as part of the first chunk.

Free chunks

Once we finish using the allocated chunk of memory, we call free() to tell libc we are done with this piece of memory.

As noted, the structure differs from that of allocated chunks. After the chunk size field, two additional fields, fd and bk, are present. These are pointers that refer to the beginning of the next and previous chunks, respectively. These pointers are crucial for managing the chunk list, and we’ll explore their significance in more detail shortly.

To make memory management more efficient, glibc (the GNU C Library) works hard to reuse memory that has already been allocated and freed. As a result, calling free() enables the recycling of memory for future requests.

For example, suppose your program allocates 100 bytes to store a string entered by the user. Once you're done with that string, you tell glibc that you no longer need the memory. Later, if you need another 100 bytes for a new string, why not reuse the same memory? This saves time and resources.

The key to this memory reuse is the system of bins. Bins are lists of "free" memory chunks that glibc maintains in order to quickly find and allocate space. These chunks are organized into different bins based on their size, which allows for more efficient searching and allocation. When a chunk of memory is freed, glibc doesn’t physically erase it; instead, it simply updates a pointer to this chunk and places it into the appropriate bin for future reuse.

The free() function also includes several security checks to prevent memory corruption. For instance, free() verifies that the pointer passed as a parameter was previously returned by malloc. Other checks include the following:

Ensuring that the allocation is aligned to an 8-byte boundary (or a 16-byte boundary on 64-bit systems), as malloc guarantees alignment for all allocations.
Verifying that the chunk's size field is valid—this means it is not too small, too large, unaligned, or exceeding the process's address space.
Confirming that the chunk lies within the boundaries of the memory arena.
Checking that the chunk has not already been marked as free by inspecting the "P" bit in the metadata of the next chunk.

Bin Operations

When freeing a chunk, glibc places it in the corresponding bin based on its size. There are five bins: fastbins, the unsorted bin, smallbins, largebins and tcache bins (which we will discuss later).

The basic procedure for freeing a chunk of memory is as follows:

If the chunk's metadata has the M bit set, it means the allocation was made off-heap, so it should be munmaped.
If not, and the chunk preceding this one is free, the chunk is merged with it to form a larger free block.
Similarly, if the chunk following this one is free, it is merged with that chunk to create a larger free block.
If this new, potentially larger block is adjacent to the "top" of the heap, it is absorbed into the heap's end rather than being placed into a "bin."
Otherwise, the chunk is marked as free and stored in the appropriate bin.

Small Bins

There are 62 small bins in total, each corresponding to a specific size, ranging from 16 bytes up to 504 bytes. Each small bin holds chunks of the same size. Small bins are organized as doubly-linked lists, and memory allocations and deallocations follow a First-In-First-Out (FIFO) order.

The FD (forward) and BK (backward) pointers, as previously explained, link each chunk to the next and previous chunks within the bin, respectively.

Large Bins

While the strategy used for small bins works well for managing small allocations, it's not feasible to have a separate bin for every possible chunk size. For larger chunks—those exceeding 512 bytes (1024 bytes on 64-bit systems)—the heap manager employs large bins.

There are 63 large bins, and while they function similarly to small bins, they differ in that each large bin stores chunks within a specific size range rather than holding chunks of a fixed size. These size ranges are carefully chosen so that they don’t overlap with the sizes managed by small bins or other large bins. This means that for any given chunk size, there will always be exactly one corresponding small bin or large bin.

Because large bins store chunks across a range of sizes, insertions must be sorted manually, and allocations require traversing through the bin’s list. As a result, large bins tend to be slower to manage compared to small bins. However, they are used less frequently in most programs, as small allocations are much more common than large ones.

To optimize for this, the ranges of large bins are designed to cluster around smaller chunk sizes. For example, the smallest large bin covers the range from 512 to 576 bytes, while the second largest large bin spans up to 256KB. The largest bin manages chunks over 1MB.

Unsorted Bin

There is a single unsorted bin in glibc, which serves as a holding area for both small and large chunks that have been freed. This bin plays a crucial role in improving the speed of memory allocation and deallocation.

Instead of immediately placing newly freed chunks into their designated bins, the heap manager first attempts to coalesce them with neighboring chunks. These merged chunks are then placed into the unsorted bin as a final attempt to be reused. If a future allocation request is smaller than an available chunk in this bin, the chunk is split into two parts, with one part (of the requested size) used to satisfy the allocation. This process is known as the "Last Remainder Chunk." If the requested size is larger than any chunk in the unsorted bin, those chunks are moved to the appropriate small or large bins for future use.

Last Remainder

When malloc is used and a chunk is divided (e.g., from the Unsorted Bin or from the top chunk), the part that remains after the division is called the Last Remainder. A pointer to this chunk is stored in the malloc_state struct.

Fastbins

Fastbins are used to manage small memory chunks. There are 10 separate bins for different chunk sizes, ranging from 16 bytes up to 88 bytes, plus metadata. These bins are optimized for quick allocation and deallocation of small memory blocks.

Unlike chunks in the small bins, chunks in the fast bins are never merged with their neighbors. In practice, this is because the heap manager does not set the "P" bit at the beginning of the next chunk. You can interpret it as the heap manager not "truly" freeing the chunks in the fastbins.

Similar to their smaller bin counterparts, fastbins manage only a single chunk size, which allows them to be automatically sorted, making insertions and removals exceptionally fast. Additionally, since chunks in fastbins are never candidates for merging, they can be stored in singly linked lists, instead of needing doubly linked lists to handle potential removal when a chunk is merged.

However, the main drawback of fastbins is that the chunks stored within them are not "fully" freed or merged. Over time, this can lead to memory fragmentation and expansion. To address this, the heap manager periodically "consolidates" the heap. This process "flushes" the fastbin by actually merging the chunks with neighboring free blocks and then placing the resulting free chunks into the unsorted bin for future allocation by malloc.

The consolidation process occurs in specific situations, such as when a malloc request exceeds the capacity of a fastbin (for example, requests larger than 512 bytes or 1024 bytes on a 64-bit system), when a chunk larger than 64KB is freed (with 64KB being a heuristically chosen threshold), or when functions like malloc_trim or mallopt are invoked.

TCache (per-thread cache) bins

The TCache bins (short for Thread Cache bins) are a modern optimization in heap management designed to improve memory allocation performance in multi-threaded applications. Each thread maintains its own small, private cache of freed memory chunks, allowing for faster allocation and deallocation without requiring global locks or synchronization.

Like Fastbins, TCache bins use singly-linked lists to store chunks, which makes them highly efficient for quick reuse. However, unlike Fastbins, TCache bins are specific to individual threads, further reducing contention in multi-threaded environments.

In the next article, we’ll delve deeper into TCache bins, exploring their structure, limitations, and how they differ from other heap mechanisms.

Head and Tail

Each bin is represented by two pointers: HEAD and TAIL. As the names suggest, HEAD refers to the top of the bin, and TAIL refers to the bottom. In most cases, insertions occur at the HEAD, meaning that in Last-In-First-Out (LIFO) structures like fastbins, reallocations also happen at the HEAD. In contrast, in First-In-First-Out (FIFO) structures like small bins, reallocations take place at the TAIL. For fastbins, the TAIL pointer is set to null, as no elements are added to the bottom.

The HEAD pointer represents the address of the first chunk in the bin, while the TAIL pointer holds the address of the last chunk.

Fastbin Operations

Fastbins are implemented as singly linked lists of chunks, designed for quickly and efficiently reusing very small chunks. To support this, chunks of fastbin size do not consolidate—they are not merged with neighboring free chunks after being freed.

A fastbin operates as a Last-In-First-Out (LIFO) structure, meaning the most recently added chunk is the first one to be removed. In glibc, only the HEAD pointer is tracked, which points to the first chunk in the list (it is set to 0 if the fastbin is empty). Each chunk in the fastbin contains an fd pointer that references the next chunk in the list (or is set to 0 if the chunk is the last one).

The process of adding a new chunk to the Fastbins is straightforward. First, when a chunk is freed, its fd is overwritten with the current HEAD pointer of the fastbin. Then, the HEAD is updated to point to the address of the newly freed chunk.

For example, imagine you have a block a allocated and a block b free. If b was the last chunk of the fastbin size freed, the HEAD of the fastbin points to b. Now, if you free a and its size corresponds to the same fastbin size of b, then the fd of a is set to point to b and the HEAD of the fastbin is updated to a.

Let's use the following program as an example:

#include <stdio.h>
#include <stdlib.h>


int main(){

  void *a = malloc(16);
  void *b = malloc(16);
  void *c = malloc(16); 

  printf("Allocation:\n");
  printf("CHUNK a: %p\n", a);
  printf("CHUNK b: %p\n", b);
  printf("CHUNK c: %p\n", c);

  free(a);
  free(b);
  free(c);

  void *d = malloc(16);
  void *e = malloc(16);
  void *f = malloc(16);

  printf("Reallocation:\n"); 
  printf("CHUNK d: %p\n", d);
  printf("CHUNK e: %p\n", e);
  printf("CHUNK f: %p\n", f);

 
  return 0; 
}

Before executing the program, let's analyze it to understand its purpose. First, it allocates three chunks of 16 bytes, storing the returned addresses in the a, b, and c pointers.

Next, the program frees these chunks in the order a, b, and c. Because fastbins operate using a Last-In-First-Out (LIFO) structure, the HEAD of the fastbin will point to c after it is freed. Additionally, the fd pointer of c will point to b (since b was the previous head before c was freed), and the fd pointer of b will point to a (since a was the previous head before b was freed).

Note

When I say that a chunk "points to" another (for example, "c points to b"), I am referring to the fd pointer of the chunk, which points to the next chunk in the list (i.e., the fd pointer of c points to the chunk b).

Finally, the program calls malloc three more times, requesting chunks of 16 bytes. The returned addresses are stored in the d, e, and f pointers. Since the requested chunk sizes match those of the previously freed chunks, the heap manager will reuse the chunks from the fastbins. Given the LIFO structure of fastbins, d will receive the chunk previously freed from c, e will receive the chunk from b, and f will receive the chunk from a.

Let’s see this process in practice. You can compile the program using gcc as follows:

@elswix@ubuntu$ gcc program.c -o program

Now, let's execute it:

@elswix@ubuntu$ ./program3
Allocation:
CHUNK a: 0x621784b622a0
CHUNK b: 0x621784b622c0
CHUNK c: 0x621784b622e0
Reallocation:
CHUNK d: 0x621784b622e0
CHUNK e: 0x621784b622c0
CHUNK f: 0x621784b622a0

As observed, during the reallocation process, d receives the chunk previously freed from c, e receives the chunk from b, and f receives the chunk from a.

Let’s visualize the process of updating the fastbin with the following image:

As observed, after each call to free(), the HEAD is updated to point to the newly freed chunk. If a chunk already exists in the fastbin, the fd pointer of the newly freed chunk points to the previous HEAD, and then the HEAD is updated to point to the newly freed chunk.

The reverse process occurs when calling malloc after those free() calls:

It might seem confusing at first why we add and remove chunks from the beginning of the list instead of the end, but it’s actually the most efficient approach. Remember that the goal of Fastbins is to be as fast and efficient as possible. Let’s consider the following example of a fastbin structure:

In this case, the HEAD points to a, and the fd pointer of a points to b, indicating that b is the next chunk in the list. Now, let’s say we free another chunk, c. If we wanted to add c to the end of the list, it would look like this:

To do this, we would need to update the fd pointer of b to point to c. However, glibc fastbins only track the HEAD of the list and don’t maintain a pointer to the end (there is no TAIL). This means that to add c at the end, we would need to start at the HEAD and traverse the entire list to find the last chunk. Once we reach the end, we would then update the fd pointer of the last chunk to point to c, making it the new last chunk.

On the other hand, if we add c to the HEAD of the list:

The process is much simpler:

We set the fd pointer of c to point to a (since a was the previous head, and glibc already has a reference to it).
We update HEAD to point to c, making it the new first chunk in the list.

This approach has much less overhead because it only involves updating the fd pointer of c and setting HEAD to c, both of which are straightforward operations.

The same principle applies when reallocating a chunk. It’s much easier to update the HEAD pointer to a by reading the fd of c than it is to traverse the entire list to find the last chunk.

Unsorted, Small, and Large Bins Operations

When a chunk that doesn’t fit into the Fastbins is freed, it is coalesced with its neighboring chunks and placed in the Unsorted Bin. Before moving chunks to their respective bins, malloc first processes the Unsorted Bin. To do so, a new allocation request must be larger than the largest chunk currently in the Unsorted Bin. Let’s revisit the steps malloc takes to allocate memory now that we understand how free chunks operate:

Fastbin Check: If the requested size qualifies as a Fastbin, the corresponding bin is checked. If a suitable chunk is available, it is returned immediately.
Smallbin Check: If the size corresponds to a Smallbin, the appropriate bin is searched. A matching chunk is returned if found.
Large Request Consolidation: For large requests, Fastbins are consolidated. This combines contiguous Fastbin chunks into a larger block, which is then moved to the Unsorted Bin.
Unsorted Bin Allocation:
- If a chunk in the Unsorted Bin is large enough to fulfill the request:
  1. The chunk is split if it exceeds the requested size, with the remainder marked as the Last Remainder Chunk.
  2. If the chunk matches the requested size, it is returned directly.
- Any remaining smaller chunks are distributed into their respective Smallbins or Largebins.
Largebin Check:
- The corresponding Largebin for the requested size is checked. If a chunk larger than the request is found:
  1. It is split, with the requested portion returned and the remainder moved to the Unsorted Bin.
- The search continues across larger indices in the Largebins until a suitable chunk is found.
Fallback to Top Chunk: If no chunk is found in the bins, the allocation proceeds from the Top Chunk.
Heap Extension: If the Top Chunk cannot satisfy an allocation request, the heap manager interacts with the operating system to extend the heap.

Heap Extension

When the top chunk, the unallocated region at the end of the heap, cannot satisfy an allocation request, the heap manager relies on system calls to obtain more memory from the kernel. Let’s explore how this process works.

Initially, the heap manager uses the sbrk system call, which internally invokes brk on most Linux-based systems. The brk call increases the "program break," effectively adding memory just beyond the region where the program was originally loaded. Since the heap manager creates the initial heap in this region, this system call extends the size of the program’s initial heap by appending new memory to its end.

To better understand this, let's examine an example of where a process's heap is located in memory. I ran a simple program and checked its memory mapping file at /proc/{PID}/maps.

As shown in the image, each section of the process—such as the heap, the stack, and shared libraries—occupies a specific range of memory addresses. In this example, the heap is mapped from 0x63dfbfd5d000 to 0x63dfbfd7e000.

You may have noticed the large unused space between the end of the heap and the starting address of the libc library. This gap allows the brk system call to expand the heap when additional memory is required. For instance, the heap's range could grow from 0x63dfbfd5d000 to an address higher than the initial 0x63dfbfd7e000, depending on the size of the requested allocation.

However, this approach has limitations. As the heap grows, it may eventually collide with other regions in the process's address space, such as memory mappings, shared libraries, or a thread’s stack. In this specific example, the limit would be the starting address of the libc library. When this happens, the heap manager can no longer use sbrk to extend the heap. Instead, it switches to mmap, which allocates non-contiguous memory blocks outside the initial heap.

Off-Heap Allocations with mmap

Extremely large allocation requests are handled differently by the heap manager. Instead of being assigned within the standard heap, these requests use mmap to allocate memory outside the heap. The metadata of these chunks includes a special flag, the A flag, to indicate their off-heap origin.

When such large allocations are freed using free, the entire memory region assigned by mmap is released back to the system through munmap. By default, this off-heap allocation strategy applies to requests exceeding certain size thresholds, such as 128 KB to 512 KB on 32-bit systems and 32 MB on 64-bit systems. These thresholds can dynamically adjust if the heap manager detects transient usage of large allocations.

If mmap also fails, the process cannot allocate additional memory, and malloc returns NULL to indicate that the allocation request could not be satisfied.

Conclusion

In this article, we explored the fundamental concepts of the heap, from its basic structure and allocation mechanisms to the key differences between stack and heap memory. We also introduced older heap implementations, such as dlmalloc, and discussed how memory allocation functions like malloc and free operate at a high level.

Additionally, we examined how the heap expands dynamically when necessary, using system calls like brk, sbrk, and mmap. These mechanisms demonstrate the interaction between the heap manager and the operating system to handle memory demands efficiently, even under constraints like process address space limits.

Looking ahead, in the next article, we’ll build on these foundational concepts to explore modern heap implementations, such as TCache bins, and their optimizations for multi-threaded environments. We will also begin delving into heap exploitation techniques, leveraging the concepts learned here to understand vulnerabilities and how attackers exploit them.

I hope you’ve learned something new in this article and found it helpful. If you did, feel free to share it with others who might benefit from it.

Happy Hacking!

Joaquín (AKA elswix)

References

https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/
https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/
https://book.hacktricks.xyz/binary-exploitation/libc-heap/heap-memory-functions/malloc-and-sysmalloc
https://book.hacktricks.xyz/binary-exploitation/libc-heap/heap-memory-functions/free
https://book.hacktricks.xyz/binary-exploitation/libc-heap/bins-and-memory-allocations
https://ir0nstone.gitbook.io/notes/binexp/heap/malloc_consolidate
https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/core_functions