PLT & GOT

Introduction

As a developer, you may have encountered these concepts and experienced how they interact with your programs. In this article, we'll delve into the Procedure Linkage Table (PLT) and the Global Offset Table (GOT).

How can you utilize libc functions without explicitly including them in your code? What does "dynamically linked" signify? Today, we'll address these questions and more.

Example program

Let's create an example program without including any libraries, but still calling functions like printf:

int main(){

    printf("Hello world!");
    return 0;

}

Let's compile it:

elswix@ubuntu$ gcc program.c -o program -no-pie -m32

You may encounter warnings, but don't worry about them.

Once the program is compiled, let's proceed to execute it:

elswix@ubuntu$ ./program
Hello world!
elswix@ubuntu$

It worked! The function printf displayed our string successfully, but how? I didn't include any library, nor did I create a function called printf. This is where dynamically linked libraries come into play.

A dynamically linked library refers to a collection of code and data that can be shared and reused by multiple programs simultaneously. When a program is executed, it dynamically links to the library at runtime, allowing it to access the library's functions and resources without embedding them directly into the program's executable file. This approach promotes code reuse, reduces executable size, and simplifies software maintenance.

Examining the dynamically linked libraries of this program reveals that libc is dynamically linked:

elswix@ubuntu$ ldd program
    linux-vdso.so.1 (0x00007fff5403a000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000781ba3a00000)
    /lib64/ld-linux-x86-64.so.2 (0x0000781ba3e26000)

This means that you can use libc functions without embedding them directly into the binary.

That's great, but how does the binary know where to call? And what happens if ASLR protection is enabled? This is where PLT and GOT come into play.

Procedure Linkage Table (PLT)

The Procedure Linkage Table (PLT) is a mechanism used in dynamically linked programs to facilitate function calls to external libraries, such as libc. When a program makes a function call to a dynamically linked function (e.g., printf), the code in the PLT is responsible for resolving the address of the function and transferring control to it. The PLT is part of the dynamic linking process, allowing programs to call functions from shared libraries without knowing their addresses at compile time.

Global Offset Table (GOT)

The Global Offset Table (GOT) is a table of pointers, typically within the executable or shared library, that holds the addresses of global variables or functions. When a program is executed, the linker resolves these addresses dynamically, allowing functions and variables to be accessed across different parts of the program or even across different modules.

Think of the GOT as a table similar to the following:

GOT vs GOT.PLT

When looking at binary sections with tools like objdump, you may noticed that there are 2 sections which contains the word got, the .got section and the .got.plt section. Both belong to the Global Offset Table. However, there are differences between them, so let's delve into it.

Firstly, the .got section contains addresses of global variables and external function calls that need to be resolved at runtime by the dynamic linker. When the program starts, the addresses in the GOT are typically populated with addresses of functions and variables by the dynamic linker/loader. This allows the program to access global variables and call functions from shared libraries without knowing their addresses at compile time.

On the other hand, the .got.plt section is specifically designed for programs that utilize lazy binding strategies for dynamic symbol resolution (we'll discuss lazy binding later). It serves as the Global Offset Table (GOT) for the Procedure Linkage Table (PLT). Unlike the .got section, which is populated by the dynamic linker at execution time, the .got.plt section is updated by the lazy binding mechanism. This means that the .got.plt is populated when an external function is called for the first time.

In this article, when I mention the GOT section, I'm actually referring to the GOT.PLT section of the Global Offset Table. It's important to keep this in mind to avoid confusion.

GOT & PLT

The Global Offset Table (GOT) and Procedure Linkage Table (PLT) are both essential components in the process of dynamic linking in executable programs.

The GOT holds addresses of global variables and functions that are referenced within the program. When the program is loaded into memory, the addresses in the GOT are initially set to point to placeholder code called stubs within the PLT.

The PLT, on the other hand, contains the actual code necessary for dynamic linking. When a function referenced in the program is called for the first time, the PLT stub is executed. This stub is responsible for dynamically resolving the address of the function and updating the corresponding entry in the GOT with the actual address. This process is known as Lazy Binding. Subsequent calls to the same function bypass the PLT and directly use the address stored in the GOT.

Practice

Let's put this into practice:

int main(){

    puts("Hello for the first time!");
    puts("Hello for the second time!");
    puts("Hello for the third time!");
    return 0;

}

Compile it with gcc and include the -no-pie and -m32 parameters.

elswix@ubuntu$ gcc program.c -o program -no-pie -m32

Let's analyze this binary in GDB to conduct a comprehensive examination of the low-level instructions it executes.

elswix@ubuntu$ gdb -q ./program
GEF for linux ready, type `gef' to start, `gef config' to configure
88 commands loaded and 5 functions added for GDB 12.1 in 0.00ms using Python engine 3.10
Reading symbols from program...
(No debugging symbols found in program)
gef$

To begin, let's disassemble the main function:

gef$ disass main
Dump of assembler code for function main:
   0x08049176 <+0>:  lea    ecx,[esp+0x4]
   0x0804917a <+4>:  and    esp,0xfffffff0
   0x0804917d <+7>:  push   DWORD PTR [ecx-0x4]
   0x08049180 <+10>: push   ebp
   0x08049181 <+11>: mov    ebp,esp
   0x08049183 <+13>: push   ebx
   0x08049184 <+14>: push   ecx
   0x08049185 <+15>: call   0x80490b0 <__x86.get_pc_thunk.bx>
   0x0804918a <+20>: add    ebx,0x2e76
   0x08049190 <+26>: sub    esp,0xc
   0x08049193 <+29>: lea    eax,[ebx-0x1ff8]
   0x08049199 <+35>: push   eax
   0x0804919a <+36>: call   0x8049050 <puts@plt>
   0x0804919f <+41>: add    esp,0x10
   0x080491a2 <+44>: sub    esp,0xc
   0x080491a5 <+47>: lea    eax,[ebx-0x1fde]
   0x080491ab <+53>: push   eax
   0x080491ac <+54>: call   0x8049050 <puts@plt>
   0x080491b1 <+59>: add    esp,0x10
   0x080491b4 <+62>: sub    esp,0xc
   0x080491b7 <+65>: lea    eax,[ebx-0x1fc3]
   0x080491bd <+71>: push   eax
   0x080491be <+72>: call   0x8049050 <puts@plt>
   0x080491c3 <+77>: add    esp,0x10
   0x080491c6 <+80>: mov    eax,0x0
   0x080491cb <+85>: lea    esp,[ebp-0x8]
   0x080491ce <+88>: pop    ecx
   0x080491cf <+89>: pop    ebx
   0x080491d0 <+90>: pop    ebp
   0x080491d1 <+91>: lea    esp,[ecx-0x4]
   0x080491d4 <+94>: ret    
End of assembler dump.
gef$

As observed, there are three calls to puts. However, it's important to note that these are not direct calls to the puts function itself, rather, they are calls to the puts@plt entry. Upon disassembling the PLT entry, the following is observed:

gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
   0x08049050 <+0>:  jmp    DWORD PTR ds:0x804c010
   0x08049056 <+6>:  push   0x8
   0x0804905b <+11>: jmp    0x8049030
End of assembler dump.
gef$

It has three instructions. Interestingly, the first instruction is a jump to an address stored in 0x804c010. The jmp DWORD PTR ds:0x804c010 instruction tells the processor to jump to the address stored in memory location 0x804c010.

Let's examine this address:

gef$ x 0x804c010
0x804c010 <puts@got.plt>:   0x08049056
gef$

As you can see, this address corresponds to the puts Global Offset Table (GOT) entry, which in turn references another address. So when executing the jump instruction, we're actually jumping to the address 0x08049056.

The GOT looks like this:

Note:

Actually, in the "Function" column, the address of the GOT entry is listed instead of the function name. Therefore, instead of puts() it should be 0x804c010. However, I believe this approach makes the concept clearer.

The address 0x08049056 (the destination of the jmp instruction) is indeed the next instruction following the jmp instruction itself:

gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
   0x08049050 <+0>:  jmp    DWORD PTR ds:0x804c010
   0x08049056 <+6>:  push   0x8
   0x0804905b <+11>: jmp    0x8049030
End of assembler dump.
gef$

As observed, the address 0x08049056 corresponds to the next instruction following the jump. At first glance, one might question the purpose of this jump.

The reason for jumping to the next instruction is that the GOT entry is initially empty. Since it's the first time the function is being called, the GOT entry for puts() has not been resolved yet. Consequently, the jump occurs to the subsequent instruction to initiate the dynamic linking process. This involves the resolution of the address for puts() by the dynamic linker, after which subsequent calls to puts() will directly use the resolved address stored in the GOT.

Let's see how this works in runtime. Firstly, let's define a breakpoint at the call of the puts@plt:

gef$ disass main
Dump of assembler code for function main:
...[snip]...
   0x08049193 <+29>: lea    eax,[ebx-0x1ff8]
   0x08049199 <+35>: push   eax
   0x0804919a <+36>: call   0x8049050 <puts@plt>
   0x0804919f <+41>: add    esp,0x10
...[snip]...   
End of assembler dump.
gef$

The call instruction is at 0x0804919a, let's create a breakpoint:

gef$ b *0x0804919a
Breakpoint 1 at 0x804919a
gef$

Now let's execute the program:

gef$ r
Starting program: /home/elswix/Desktop/elswix/Local/PLT_and_GOT_article/program 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x0804919a in main ()
...[snip]...
gef$

We reached the breakpoint, the program stopped at the call instruction:

gef$ x/i $eip
=> 0x804919a <main+36>:  call   0x8049050 <puts@plt>
gef$

Let's continue to the next instruction with the si command:

gef$ si
0x08049050 in puts@plt ()
...[snip]...
gef$ x/3i $eip
=> 0x8049050 <puts@plt>:    jmp    DWORD PTR ds:0x804c010
   0x8049056 <puts@plt+6>:  push   0x8
   0x804905b <puts@plt+11>: jmp    0x8049030
gef$

As you can see, we are in the puts@plt section. The next instruction is the jump to the GOT entry. The GOT entry is at 0x804c010, it should hold the memory address of the puts function in libc, however, this is the first time the function gets called, so the GOT entry is empty:

gef$ x 0x804c010
0x804c010 <puts@got.plt>:   0x08049056
gef$

As evident, the address stored at the GOT entry corresponds to the address of the next instruction after the jump. Therefore, upon continuation, the execution proceeds to the subsequent instruction:

gef$ si
0x08049056 in puts@plt ()
...[snip]...
gef$ x/3i $eip
=> 0x8049056 <puts@plt+6>:    push   0x8
   0x804905b <puts@plt+11>:   jmp    0x8049030
   0x8049060 <_start>:        endbr32

The subsequent instruction involves pushing a value onto the stack, representing the index of the target symbol. Following this, a jump occurs to the beginning of the PLT section. You can verify the starting address of the PLT section using objdump:

elswix@ubuntu$ objdump -h program | grep "\ .plt"
 11 .plt          00000050  08049030  08049030  00001030  2**4

After the jump instruction is executed, the execution flow reaches the beginning of the PLT:

gef$ x/5i $eip
=> 0x8049030:    push   DWORD PTR ds:0x804c004
   0x8049036:    jmp    DWORD PTR ds:0x804c008
   0x804903c:    add    BYTE PTR [eax],al
   0x804903e:    add    BYTE PTR [eax],al
   0x8049040 <__libc_start_main@plt>: jmp    DWORD PTR ds:0x804c00c
gef$

The first instruction entails pushing a value stored at 0x804c004, the second entry in the .got.plt section (you can verify this using objdump). Remember, DWORD PTR ds:0x804c004 signifies that the processor will dereference the address 0x804c004 and push the resulting value onto the stack.

Subsequently, there's a jump instruction, which again dereferences a pointer (memory address), located 4 bytes after the previous push (0x804c008). Thus, it points to the third entry in the .got.plt section. Let's delve into what these addresses reference.

gef$  x/2wx 0x804c004
0x804c004:   0xf7ffda40  0xf7fd8f80
gef$

They indeed reference other memory addresses. Upon examining the shared libraries linked dynamically to this binary, it becomes apparent that the addresses being referenced fall within the range of addresses allocated to the ld.so library.

gef$  info dll
From        To          Syms Read   Shared Object Library
0xf7fc7090  0xf7feb2a5  Yes (*)     /lib/ld-linux.so.2
0xf7c20290  0xf7d9cb59  Yes (*)     /lib32/libc.so.6
(*): Shared library is missing debugging information.
gef$

The range of addresses allocated to the ld.so library spans from 0xf7fc7090 to 0xf7feb2a5. Therefore, the addresses 0xf7ffda40 and 0xf7fd8f80 indeed belong to the ld.so library.

As it says in the manual page:

The programs ld.so and ld-linux.so* find and load the shared objects (shared libraries) needed by a program, prepare the program 
to run, and then run it.

The dynamic linker/loader, is a system library responsible for dynamically linking shared libraries during program execution. When a program starts, ld.so is invoked by the system to resolve and load the necessary shared libraries required by the program into memory. It searches for the libraries specified in the program's dependencies, resolves symbols, and maps them to their corresponding addresses in memory.

In simple terms, the process involves locating the memory address of the desired function and then updating the corresponding entry in the Global Offset Table (GOT) with this memory address. Consequently, the next time the program requires this function and calls the Procedure Linkage Table (PLT), the initial jump at the beginning will directly redirect to the desired function, bypassing the need for further resolution.

Continuing with GDB, after all those instructions of the ld.so library, we see the output of the call to puts:

gef$ ni
Hello for the first time!
0x0804919f in main ()
gef$

Then, looking at the GOT entry of puts, I noticed that it was overwritten with a new address:

gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
   0x08049050 <+0>:  jmp    DWORD PTR ds:0x804c010
   0x08049056 <+6>:  push   0x8
   0x0804905b <+11>: jmp    0x8049030
End of assembler dump.
gef$ x 0x804c010
0x804c010 <puts@got.plt>:   0xf7c72880
gef$

As you can see, now the puts GOT Entry stores the address 0xf7c72880. Upon examining it, we observe that it belongs to the actual puts function:

gef$ x 0xf7c72880
0xf7c72880 <puts>:    0xfb1e0ff3
gef$

The ld.so library made its work. It found the actual puts function address and then overwrote the puts GOT entry. Let's see what happens when calling puts again:

gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
=> 0x08049050 <+0>:  jmp    DWORD PTR ds:0x804c010
   0x08049056 <+6>:  push   0x8
   0x0804905b <+11>: jmp    0x8049030
End of assembler dump.
gef$

As observed, we've reached the puts@plt entry. The subsequent instruction is the jump we encountered at the beginning, but this time, instead of jumping to the next instruction, it should redirect to the actual puts function in the libc library:

gef$  ni
0xf7c72880 in puts () from /lib32/libc.so.6
gef$

Great! We've successfully reached the puts function in the libc library without requiring the dynamic linker to search for it.

Lazy Binding - Summary

When a program is executed, it lacks the addresses of external functions such as printf(). It relies on the dynamic linker to locate these addresses. Initially, instead of directly invoking the printf function, the program calls the Procedure Linkage Table (PLT) entry corresponding to printf, which consists of three instructions. The first instruction is a jump to an address referenced by an entry in the Global Offset Table (GOT). Initially, during the first call, this entry points to the next instruction after the jump, rather than the actual printf address. Upon execution, the program pushes a value onto the stack and jumps to the beginning of the PLT section, where code prepares the dynamic linker. Subsequently, the dynamic linker is invoked to locate the address of the printf function. Once found, it updates the GOT entry for printf with this address.

During subsequent calls to printf, the first instruction of the PLT jumps directly to the updated GOT entry, now holding the address of the actual printf function. Thus, the resolving process is bypassed, optimizing performance.

Conclusion

The Procedure Linkage Table (PLT) and the Global Offset Table (GOT) play crucial roles in dynamically linked programs. Through our exploration, we've uncovered their significance in enabling the dynamic linking process, allowing programs to efficiently utilize external functions without the need for their direct inclusion, thus significantly reducing programs size and complexity.

The PLT serves as an intermediary between the program and external functions, providing a mechanism for lazy binding and resolving function addresses only when they are first called. Meanwhile, the GOT acts as a table of pointers, serving as a repository for global data and function addresses. It facilitates efficient access to these resources across different modules, essentially acting as a map that directs the program to the right place. This dynamic linking approach not only conserves memory but also promotes code reusability and modularity.

In upcoming articles, we'll discuss how to use these concepts to carry out binary exploitation. When introducing the Format String Vulnerability, we'll learn how to execute a GOT overwrite attack. The overwriting of the GOT is particularly relevant in the context of the format string vulnerability due to its ability to manipulate the program's execution flow. By exploiting a format string vulnerability, an attacker can control the data written to the GOT, enabling them to redirect the program to malicious functions or even achieve the execution of arbitrary code. Consequently, understanding how the GOT works and how it can be manipulated is crucial for comprehending and successfully exploiting this vulnerability.

References

https://ir0nstone.gitbook.io/notes/types/stack/aslr/plt_and_got

https://ctf101.org/binary-exploitation/what-is-the-got/

https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html

https://www.youtube.com/watch?v=kUk5pw4w0h4

https://www.youtube.com/watch?v=B4-wVdQo040