# Unicode Injection {% hint style="success" %} Learn & practice AWS Hacking:

[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)

\ Learn & practice GCP Hacking:

[**HackTricks Training GCP Red Team Expert (GRTE)**

](https://training.hacktricks.xyz/courses/grte)

Support HackTricks

* Check the [**subscription plans**](https://github.com/sponsors/carlospolop)! * **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.** * **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.

{% endhint %} ## Introduction Depending on how the back-end/front-end is behaving when it **receives weird unicode characters** an attacker might be able to **bypass protections and inject arbitrary characters** that could be used to **abused injection vulnerabilities** such as XSS or SQLi. ## Unicode Normalization Unicode normalization occurs when **unicode characters are normalized to ascii characters**. One common scenario of this type of vulnerability occurs when the system is **modifying** somehow the **input** of the user **after having checked it**. For example, in some languages a simple call to make the **input uppercase or lowercase** could normalize the given input and the **unicode will be transformed into ASCII** generating new characters.\ For more info check: {% content-ref url="unicode-normalization.md" %} [unicode-normalization.md](unicode-normalization.md) {% endcontent-ref %} ## `\u` to `%` Unicode characters are usually represented with the **`\u` prefix**. For example the char `㱋` is `\u3c4b`([check it here](https://unicode-explorer.com/c/3c4B)). If a backend **transforms** the prefix **`\u` in `%`**, the resulting string will be `%3c4b`, which URL decoded is: **`<4b`**. And, as you can see, a **`<` char is injected**.\ You could use this technique to **inject any kind of char** if the backend is vulnerable.\ Check [https://unicode-explorer.com/](https://unicode-explorer.com/) to find the chars you need. This vuln actually comes from a vulnerability a researcher found, for a more in depth explanation check [https://www.youtube.com/watch?v=aUsAHb0E7Cg](https://www.youtube.com/watch?v=aUsAHb0E7Cg) ## Emoji Injection Back-ends something behaves weirdly when they **receives emojis**. That's what happened in [**this writeup**](https://medium.com/@fpatrik/how-i-found-an-xss-vulnerability-via-using-emojis-7ad72de49209) where the researcher managed to achieve a XSS with a payload such as: `💋img src=x onerror=alert(document.domain)//💛` In this case, the error was that the server after removing the malicious characters **converted the UTF-8 string from Windows-1252 to UTF-8** (basically the input encoding and the convert from encoding mismatched). Then this does not give a proper < just a weird unicode one: `‹`\ ``So they took this output and **converted again now from UTF-8 ot ASCII**. This **normalized** the `‹` to `<` this is how the exploit could work on that system.\ This is what happened: ```php [**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)

\ Learn & practice GCP Hacking:

[**HackTricks Training GCP Red Team Expert (GRTE)**

](https://training.hacktricks.xyz/courses/grte)

Support HackTricks

{% endhint %}