# Docker --privileged {% hint style="success" %} Learn & practice AWS Hacking:[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)\ Learn & practice GCP Hacking: [**HackTricks Training GCP Red Team Expert (GRTE)**](https://training.hacktricks.xyz/courses/grte)
Support HackTricks * Check the [**subscription plans**](https://github.com/sponsors/carlospolop)! * **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.** * **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.
{% endhint %} ## What Affects When you run a container as privileged these are the protections you are disabling: ### Mount /dev In a privileged container, all the **devices can be accessed in `/dev/`**. Therefore you can **escape** by **mounting** the disk of the host. {% tabs %} {% tab title="Inside default container" %} ```bash # docker run --rm -it alpine sh ls /dev console fd mqueue ptmx random stderr stdout urandom core full null pts shm stdin tty zero ``` {% endtab %} {% tab title="Inside Privileged Container" %} ```bash # docker run --rm --privileged -it alpine sh ls /dev cachefiles mapper port shm tty24 tty44 tty7 console mem psaux stderr tty25 tty45 tty8 core mqueue ptmx stdin tty26 tty46 tty9 cpu nbd0 pts stdout tty27 tty47 ttyS0 [...] ``` {% endtab %} {% endtabs %} ### Read-only kernel file systems Kernel file systems provide a mechanism for a process to modify the behavior of the kernel. However, when it comes to container processes, we want to prevent them from making any changes to the kernel. Therefore, we mount kernel file systems as **read-only** within the container, ensuring that the container processes cannot modify the kernel. {% tabs %} {% tab title="Inside default container" %} ```bash # docker run --rm -it alpine sh mount | grep '(ro' sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) cpuset on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset) cpu on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu) cpuacct on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct) ``` {% endtab %} {% tab title="Inside Privileged Container" %} ```bash # docker run --rm --privileged -it alpine sh mount | grep '(ro' ``` {% endtab %} {% endtabs %} ### Masking over kernel file systems The **/proc** file system is selectively writable but for security, certain parts are shielded from write and read access by overlaying them with **tmpfs**, ensuring container processes can't access sensitive areas. {% hint style="info" %} **tmpfs** is a file system that stores all the files in virtual memory. tmpfs doesn't create any files on your hard drive. So if you unmount a tmpfs file system, all the files residing in it are lost for ever. {% endhint %} {% tabs %} {% tab title="Inside default container" %} ```bash # docker run --rm -it alpine sh mount | grep /proc.*tmpfs tmpfs on /proc/acpi type tmpfs (ro,relatime) tmpfs on /proc/kcore type tmpfs (rw,nosuid,size=65536k,mode=755) tmpfs on /proc/keys type tmpfs (rw,nosuid,size=65536k,mode=755) ``` {% endtab %} {% tab title="Inside Privileged Container" %} ```bash # docker run --rm --privileged -it alpine sh mount | grep /proc.*tmpfs ``` {% endtab %} {% endtabs %} ### Linux capabilities Container engines launch the containers with a **limited number of capabilities** to control what goes on inside of the container by default. **Privileged** ones have **all** the **capabilities** accesible. To learn about capabilities read: {% content-ref url="../linux-capabilities.md" %} [linux-capabilities.md](../linux-capabilities.md) {% endcontent-ref %} {% tabs %} {% tab title="Inside default container" %} ```bash # docker run --rm -it alpine sh apk add -U libcap; capsh --print [...] Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=eip Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap [...] ``` {% endtab %} {% tab title="Inside Privileged Container" %} ```bash # docker run --rm --privileged -it alpine sh apk add -U libcap; capsh --print [...] Current: =eip cap_perfmon,cap_bpf,cap_checkpoint_restore-eip Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read [...] ``` {% endtab %} {% endtabs %} You can manipulate the capabilities available to a container without running in `--privileged` mode by using the `--cap-add` and `--cap-drop` flags. ### Seccomp **Seccomp** is useful to **limit** the **syscalls** a container can call. A default seccomp profile is enabled by default when running docker containers, but in privileged mode it is disabled. Learn more about Seccomp here: {% content-ref url="seccomp.md" %} [seccomp.md](seccomp.md) {% endcontent-ref %} {% tabs %} {% tab title="Inside default container" %} ```bash # docker run --rm -it alpine sh grep Seccomp /proc/1/status Seccomp: 2 Seccomp_filters: 1 ``` {% endtab %} {% tab title="Inside Privileged Container" %} ```bash # docker run --rm --privileged -it alpine sh grep Seccomp /proc/1/status Seccomp: 0 Seccomp_filters: 0 ``` {% endtab %} {% endtabs %} ```bash # You can manually disable seccomp in docker with --security-opt seccomp=unconfined ``` Also, note that when Docker (or other CRIs) are used in a **Kubernetes** cluster, the **seccomp filter is disabled by default** ### AppArmor **AppArmor** is a kernel enhancement to confine **containers** to a **limited** set of **resources** with **per-program profiles**. When you run with the `--privileged` flag, this protection is disabled. {% content-ref url="apparmor.md" %} [apparmor.md](apparmor.md) {% endcontent-ref %} ```bash # You can manually disable seccomp in docker with --security-opt apparmor=unconfined ``` ### SELinux Running a container with the `--privileged` flag disables **SELinux labels**, causing it to inherit the label of the container engine, typically `unconfined`, granting full access similar to the container engine. In rootless mode, it uses `container_runtime_t`, while in root mode, `spc_t` is applied. {% content-ref url="../selinux.md" %} [selinux.md](../selinux.md) {% endcontent-ref %} ```bash # You can manually disable selinux in docker with --security-opt label:disable ``` ## What Doesn't Affect ### Namespaces Namespaces are **NOT affected** by the `--privileged` flag. Even though they don't have the security constraints enabled, they **do not see all of the processes on the system or the host network, for example**. Users can disable individual namespaces by using the **`--pid=host`, `--net=host`, `--ipc=host`, `--uts=host`** container engines flags. {% tabs %} {% tab title="Inside default privileged container" %} ```bash # docker run --rm --privileged -it alpine sh ps -ef PID USER TIME COMMAND 1 root 0:00 sh 18 root 0:00 ps -ef ``` {% endtab %} {% tab title="Inside --pid=host Container" %} ```bash # docker run --rm --privileged --pid=host -it alpine sh ps -ef PID USER TIME COMMAND 1 root 0:03 /sbin/init 2 root 0:00 [kthreadd] 3 root 0:00 [rcu_gp]ount | grep /proc.*tmpfs [...] ``` {% endtab %} {% endtabs %} ### User namespace **By default, container engines don't utilize user namespaces, except for rootless containers**, which require them for file system mounting and using multiple UIDs. User namespaces, integral for rootless containers, cannot be disabled and significantly enhance security by restricting privileges. ## References * [https://www.redhat.com/sysadmin/privileged-flag-container-engines](https://www.redhat.com/sysadmin/privileged-flag-container-engines) {% hint style="success" %} Learn & practice AWS Hacking:[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)\ Learn & practice GCP Hacking: [**HackTricks Training GCP Red Team Expert (GRTE)**](https://training.hacktricks.xyz/courses/grte)
Support HackTricks * Check the [**subscription plans**](https://github.com/sponsors/carlospolop)! * **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.** * **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.
{% endhint %}