Introduction
As a developer, you may have encountered these concepts and experienced how they interact with your programs. In this article, we'll delve into the Procedure Linkage Table (PLT) and the Global Offset Table (GOT).
How can you utilize libc functions without explicitly including them in your code? What does "dynamically linked" signify? Today, we'll address these questions and more.
Example program
Let's create an example program without including any libraries, but still calling functions like printf
:
int main(){
printf("Hello world!");
return 0;
}
Let's compile it:
elswix@ubuntu$ gcc program.c -o program -no-pie -m32
You may encounter warnings, but don't worry about them.
Once the program is compiled, let's proceed to execute it:
elswix@ubuntu$ ./program
Hello world!
elswix@ubuntu$
It worked! The function printf
displayed our string successfully, but how? I didn't include any library, nor did I create a function called printf
. This is where dynamically linked libraries come into play.
A dynamically linked library refers to a collection of code and data that can be shared and reused by multiple programs simultaneously. When a program is executed, it dynamically links to the library at runtime, allowing it to access the library's functions and resources without embedding them directly into the program's executable file. This approach promotes code reuse, reduces executable size, and simplifies software maintenance.
Examining the dynamically linked libraries of this program reveals that libc
is dynamically linked:
elswix@ubuntu$ ldd program
linux-vdso.so.1 (0x00007fff5403a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000781ba3a00000)
/lib64/ld-linux-x86-64.so.2 (0x0000781ba3e26000)
This means that you can use libc functions
without embedding them directly into the binary.
That's great, but how does the binary know where to call? And what happens if ASLR protection is enabled? This is where PLT and GOT come into play.
Procedure Linkage Table (PLT)
The Procedure Linkage Table (PLT) is a mechanism used in dynamically linked programs to facilitate function calls to external libraries, such as libc. When a program makes a function call to a dynamically linked function (e.g., printf
), the code in the PLT is responsible for resolving the address of the function and transferring control to it. The PLT is part of the dynamic linking process, allowing programs to call functions from shared libraries without knowing their addresses at compile time.
Global Offset Table (GOT)
The Global Offset Table (GOT) is a table of pointers, typically within the executable or shared library, that holds the addresses of global variables or functions. When a program is executed, the linker resolves these addresses dynamically, allowing functions and variables to be accessed across different parts of the program or even across different modules.
Think of the GOT as a table similar to the following:
GOT vs GOT.PLT
When looking at binary sections with tools like objdump, you may noticed that there are 2 sections which contains the word got, the .got
section and the .got.plt
section. Both belong to the Global Offset Table. However, there are differences between them, so let's delve into it.
Firstly, the .got
section contains addresses of global variables and external function calls that need to be resolved at runtime by the dynamic linker. When the program starts, the addresses in the GOT are typically populated with addresses of functions and variables by the dynamic linker/loader. This allows the program to access global variables and call functions from shared libraries without knowing their addresses at compile time.
On the other hand, the .got.plt
section is specifically designed for programs that utilize lazy binding strategies for dynamic symbol resolution (we'll discuss lazy binding later). It serves as the Global Offset Table (GOT) for the Procedure Linkage Table (PLT). Unlike the .got
section, which is populated by the dynamic linker at execution time, the .got.plt
section is updated by the lazy binding mechanism. This means that the .got.plt
is populated when an external function is called for the first time.
In this article, when I mention the GOT
section, I'm actually referring to the GOT.PLT
section of the Global Offset Table. It's important to keep this in mind to avoid confusion.
GOT & PLT
The Global Offset Table (GOT) and Procedure Linkage Table (PLT) are both essential components in the process of dynamic linking in executable programs.
The GOT holds addresses of global variables and functions that are referenced within the program. When the program is loaded into memory, the addresses in the GOT are initially set to point to placeholder code called stubs within the PLT.
The PLT, on the other hand, contains the actual code necessary for dynamic linking. When a function referenced in the program is called for the first time, the PLT stub is executed. This stub is responsible for dynamically resolving the address of the function and updating the corresponding entry in the GOT with the actual address. This process is known as Lazy Binding. Subsequent calls to the same function bypass the PLT and directly use the address stored in the GOT.
Practice
Let's put this into practice:
int main(){
puts("Hello for the first time!");
puts("Hello for the second time!");
puts("Hello for the third time!");
return 0;
}
Compile it with gcc
and include the -no-pie
and -m32
parameters.
elswix@ubuntu$ gcc program.c -o program -no-pie -m32
Let's analyze this binary in GDB to conduct a comprehensive examination of the low-level instructions it executes.
elswix@ubuntu$ gdb -q ./program
GEF for linux ready, type `gef' to start, `gef config' to configure
88 commands loaded and 5 functions added for GDB 12.1 in 0.00ms using Python engine 3.10
Reading symbols from program...
(No debugging symbols found in program)
gef$
To begin, let's disassemble the main function:
gef$ disass main
Dump of assembler code for function main:
0x08049176 <+0>: lea ecx,[esp+0x4]
0x0804917a <+4>: and esp,0xfffffff0
0x0804917d <+7>: push DWORD PTR [ecx-0x4]
0x08049180 <+10>: push ebp
0x08049181 <+11>: mov ebp,esp
0x08049183 <+13>: push ebx
0x08049184 <+14>: push ecx
0x08049185 <+15>: call 0x80490b0 <__x86.get_pc_thunk.bx>
0x0804918a <+20>: add ebx,0x2e76
0x08049190 <+26>: sub esp,0xc
0x08049193 <+29>: lea eax,[ebx-0x1ff8]
0x08049199 <+35>: push eax
0x0804919a <+36>: call 0x8049050 <puts@plt>
0x0804919f <+41>: add esp,0x10
0x080491a2 <+44>: sub esp,0xc
0x080491a5 <+47>: lea eax,[ebx-0x1fde]
0x080491ab <+53>: push eax
0x080491ac <+54>: call 0x8049050 <puts@plt>
0x080491b1 <+59>: add esp,0x10
0x080491b4 <+62>: sub esp,0xc
0x080491b7 <+65>: lea eax,[ebx-0x1fc3]
0x080491bd <+71>: push eax
0x080491be <+72>: call 0x8049050 <puts@plt>
0x080491c3 <+77>: add esp,0x10
0x080491c6 <+80>: mov eax,0x0
0x080491cb <+85>: lea esp,[ebp-0x8]
0x080491ce <+88>: pop ecx
0x080491cf <+89>: pop ebx
0x080491d0 <+90>: pop ebp
0x080491d1 <+91>: lea esp,[ecx-0x4]
0x080491d4 <+94>: ret
End of assembler dump.
gef$
As observed, there are three calls to puts
. However, it's important to note that these are not direct calls to the puts
function itself, rather, they are calls to the puts@plt
entry. Upon disassembling the PLT entry, the following is observed:
gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
0x08049050 <+0>: jmp DWORD PTR ds:0x804c010
0x08049056 <+6>: push 0x8
0x0804905b <+11>: jmp 0x8049030
End of assembler dump.
gef$
It has three instructions. Interestingly, the first instruction is a jump to an address stored in 0x804c010
. The jmp DWORD PTR ds:0x804c010
instruction tells the processor to jump to the address stored in memory location 0x804c010
.
Let's examine this address:
gef$ x 0x804c010
0x804c010 <puts@got.plt>: 0x08049056
gef$
As you can see, this address corresponds to the puts
Global Offset Table (GOT) entry, which in turn references another address. So when executing the jump instruction, we're actually jumping to the address 0x08049056
.
The GOT looks like this:
Note:
Actually, in the "Function" column, the address of the GOT entry is listed instead of the function name. Therefore, instead of puts()
it should be 0x804c010
. However, I believe this approach makes the concept clearer.
The address 0x08049056
(the destination of the jmp instruction) is indeed the next instruction following the jmp instruction itself:
gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
0x08049050 <+0>: jmp DWORD PTR ds:0x804c010
0x08049056 <+6>: push 0x8
0x0804905b <+11>: jmp 0x8049030
End of assembler dump.
gef$
As observed, the address 0x08049056
corresponds to the next instruction following the jump. At first glance, one might question the purpose of this jump.
The reason for jumping to the next instruction is that the GOT entry is initially empty. Since it's the first time the function is being called, the GOT entry for puts()
has not been resolved yet. Consequently, the jump occurs to the subsequent instruction to initiate the dynamic linking process. This involves the resolution of the address for puts()
by the dynamic linker, after which subsequent calls to puts()
will directly use the resolved address stored in the GOT.
Let's see how this works in runtime. Firstly, let's define a breakpoint at the call of the puts@plt
:
gef$ disass main
Dump of assembler code for function main:
...[snip]...
0x08049193 <+29>: lea eax,[ebx-0x1ff8]
0x08049199 <+35>: push eax
0x0804919a <+36>: call 0x8049050 <puts@plt>
0x0804919f <+41>: add esp,0x10
...[snip]...
End of assembler dump.
gef$
The call instruction is at 0x0804919a
, let's create a breakpoint:
gef$ b *0x0804919a
Breakpoint 1 at 0x804919a
gef$
Now let's execute the program:
gef$ r
Starting program: /home/elswix/Desktop/elswix/Local/PLT_and_GOT_article/program
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x0804919a in main ()
...[snip]...
gef$
We reached the breakpoint, the program stopped at the call
instruction:
gef$ x/i $eip
=> 0x804919a <main+36>: call 0x8049050 <puts@plt>
gef$
Let's continue to the next instruction with the si
command:
gef$ si
0x08049050 in puts@plt ()
...[snip]...
gef$ x/3i $eip
=> 0x8049050 <puts@plt>: jmp DWORD PTR ds:0x804c010
0x8049056 <puts@plt+6>: push 0x8
0x804905b <puts@plt+11>: jmp 0x8049030
gef$
As you can see, we are in the puts@plt
section. The next instruction is the jump to the GOT entry. The GOT entry is at 0x804c010
, it should hold the memory address of the puts
function in libc, however, this is the first time the function gets called, so the GOT entry is empty:
gef$ x 0x804c010
0x804c010 <puts@got.plt>: 0x08049056
gef$
As evident, the address stored at the GOT entry corresponds to the address of the next instruction after the jump. Therefore, upon continuation, the execution proceeds to the subsequent instruction:
gef$ si
0x08049056 in puts@plt ()
...[snip]...
gef$ x/3i $eip
=> 0x8049056 <puts@plt+6>: push 0x8
0x804905b <puts@plt+11>: jmp 0x8049030
0x8049060 <_start>: endbr32
The subsequent instruction involves pushing a value onto the stack, representing the index of the target symbol. Following this, a jump occurs to the beginning of the PLT section. You can verify the starting address of the PLT section using objdump
:
elswix@ubuntu$ objdump -h program | grep "\ .plt"
11 .plt 00000050 08049030 08049030 00001030 2**4
After the jump instruction is executed, the execution flow reaches the beginning of the PLT:
gef$ x/5i $eip
=> 0x8049030: push DWORD PTR ds:0x804c004
0x8049036: jmp DWORD PTR ds:0x804c008
0x804903c: add BYTE PTR [eax],al
0x804903e: add BYTE PTR [eax],al
0x8049040 <__libc_start_main@plt>: jmp DWORD PTR ds:0x804c00c
gef$
The first instruction entails pushing a value stored at 0x804c004
, the second entry in the .got.plt
section (you can verify this using objdump
). Remember, DWORD PTR ds:0x804c004
signifies that the processor will dereference the address 0x804c004
and push the resulting value onto the stack.
Subsequently, there's a jump instruction, which again dereferences a pointer (memory address), located 4 bytes after the previous push (0x804c008
). Thus, it points to the third entry in the .got.plt
section. Let's delve into what these addresses reference.
gef$ x/2wx 0x804c004
0x804c004: 0xf7ffda40 0xf7fd8f80
gef$
They indeed reference other memory addresses. Upon examining the shared libraries linked dynamically to this binary, it becomes apparent that the addresses being referenced fall within the range of addresses allocated to the ld.so
library.
gef$ info dll
From To Syms Read Shared Object Library
0xf7fc7090 0xf7feb2a5 Yes (*) /lib/ld-linux.so.2
0xf7c20290 0xf7d9cb59 Yes (*) /lib32/libc.so.6
(*): Shared library is missing debugging information.
gef$
The range of addresses allocated to the ld.so
library spans from 0xf7fc7090
to 0xf7feb2a5
. Therefore, the addresses 0xf7ffda40
and 0xf7fd8f80
indeed belong to the ld.so
library.
As it says in the manual page:
The programs ld.so and ld-linux.so* find and load the shared objects (shared libraries) needed by a program, prepare the program
to run, and then run it.
The dynamic linker/loader, is a system library responsible for dynamically linking shared libraries during program execution. When a program starts, ld.so
is invoked by the system to resolve and load the necessary shared libraries required by the program into memory. It searches for the libraries specified in the program's dependencies, resolves symbols, and maps them to their corresponding addresses in memory.
In simple terms, the process involves locating the memory address of the desired function and then updating the corresponding entry in the Global Offset Table (GOT) with this memory address. Consequently, the next time the program requires this function and calls the Procedure Linkage Table (PLT), the initial jump at the beginning will directly redirect to the desired function, bypassing the need for further resolution.
Continuing with GDB, after all those instructions of the ld.so
library, we see the output of the call to puts
:
gef$ ni
Hello for the first time!
0x0804919f in main ()
gef$
Then, looking at the GOT entry of puts, I noticed that it was overwritten with a new address:
gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
0x08049050 <+0>: jmp DWORD PTR ds:0x804c010
0x08049056 <+6>: push 0x8
0x0804905b <+11>: jmp 0x8049030
End of assembler dump.
gef$ x 0x804c010
0x804c010 <puts@got.plt>: 0xf7c72880
gef$
As you can see, now the puts
GOT Entry stores the address 0xf7c72880
. Upon examining it, we observe that it belongs to the actual puts
function:
gef$ x 0xf7c72880
0xf7c72880 <puts>: 0xfb1e0ff3
gef$
The ld.so
library made its work. It found the actual puts
function address and then overwrote the puts
GOT entry. Let's see what happens when calling puts again:
gef$ disass 0x8049050
Dump of assembler code for function puts@plt:
=> 0x08049050 <+0>: jmp DWORD PTR ds:0x804c010
0x08049056 <+6>: push 0x8
0x0804905b <+11>: jmp 0x8049030
End of assembler dump.
gef$
As observed, we've reached the puts@plt
entry. The subsequent instruction is the jump we encountered at the beginning, but this time, instead of jumping to the next instruction, it should redirect to the actual puts
function in the libc library:
gef$ ni
0xf7c72880 in puts () from /lib32/libc.so.6
gef$
Great! We've successfully reached the puts
function in the libc library without requiring the dynamic linker to search for it.
Lazy Binding - Summary
When a program is executed, it lacks the addresses of external functions such as printf()
. It relies on the dynamic linker to locate these addresses. Initially, instead of directly invoking the printf
function, the program calls the Procedure Linkage Table (PLT) entry corresponding to printf
, which consists of three instructions. The first instruction is a jump to an address referenced by an entry in the Global Offset Table (GOT). Initially, during the first call, this entry points to the next instruction after the jump, rather than the actual printf
address. Upon execution, the program pushes a value onto the stack and jumps to the beginning of the PLT section, where code prepares the dynamic linker. Subsequently, the dynamic linker is invoked to locate the address of the printf
function. Once found, it updates the GOT entry for printf
with this address.
During subsequent calls to printf
, the first instruction of the PLT jumps directly to the updated GOT entry, now holding the address of the actual printf
function. Thus, the resolving process is bypassed, optimizing performance.
Conclusion
The Procedure Linkage Table (PLT) and the Global Offset Table (GOT) play crucial roles in dynamically linked programs. Through our exploration, we've uncovered their significance in enabling the dynamic linking process, allowing programs to efficiently utilize external functions without the need for their direct inclusion, thus significantly reducing programs size and complexity.
The PLT serves as an intermediary between the program and external functions, providing a mechanism for lazy binding and resolving function addresses only when they are first called. Meanwhile, the GOT acts as a table of pointers, serving as a repository for global data and function addresses. It facilitates efficient access to these resources across different modules, essentially acting as a map that directs the program to the right place. This dynamic linking approach not only conserves memory but also promotes code reusability and modularity.
In upcoming articles, we'll discuss how to use these concepts to carry out binary exploitation. When introducing the Format String Vulnerability, we'll learn how to execute a GOT overwrite attack. The overwriting of the GOT is particularly relevant in the context of the format string vulnerability due to its ability to manipulate the program's execution flow. By exploiting a format string vulnerability, an attacker can control the data written to the GOT, enabling them to redirect the program to malicious functions or even achieve the execution of arbitrary code. Consequently, understanding how the GOT works and how it can be manipulated is crucial for comprehending and successfully exploiting this vulnerability.
References
https://ir0nstone.gitbook.io/notes/types/stack/aslr/plt_and_got
https://ctf101.org/binary-exploitation/what-is-the-got/
https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html
https://www.youtube.com/watch?v=kUk5pw4w0h4
https://www.youtube.com/watch?v=kUk5pw4w0h4
https://www.youtube.com/watch?v=B4-wVdQo040