BIPI

Docker Container Escape Techniques: Privileged, Capabilities, Mount Abuse

Cybersecurity

Container escape primitives that still work in 2024 against Docker hosts, covering privileged mode, capability abuse, and dangerous mount configurations.

By Arjun Raghavan, Security & Systems Lead, BIPI · December 13, 2024 · 11 min read

#docker#container-escape#pentesting#linux#capabilities

Container escape is less about exotic kernel bugs and more about the configuration the developer accepted. Docker run flags that look harmless on a laptop become root on the host in production. This post catalogues the escape primitives we exercise during BIPI container security assessments.

Triage the container first

capsh --print to see effective capabilities inside the container
mount and /proc/self/status for namespace and mount visibility
ls /.dockerenv and /proc/1/cgroup to confirm container identity
deepce or amicontained as automated enumeration

Privileged mode

docker run --privileged disables nearly every isolation feature: full capabilities, all devices accessible, no seccomp, no AppArmor. From a privileged container, mounting the host filesystem is a one-liner. mkdir /tmp/host && mount /dev/sda1 /tmp/host gives you /tmp/host as a read-write view of the host. Drop a cron job and you are root on the host.

Dangerous capabilities

CAP_SYS_ADMIN is nearly equivalent to root; allows mount, unshare, and a long list of syscalls
CAP_SYS_PTRACE lets you attach to host processes if PID namespaces are shared
CAP_SYS_MODULE lets you load kernel modules, full host compromise
CAP_DAC_READ_SEARCH lets you read any file via open_by_handle_at
CAP_NET_ADMIN combined with host network is a pivot platform

Mount abuse

-v /:/host is the obvious one and rare in production. The interesting variants are -v /var/run/docker.sock:/var/run/docker.sock, which gives the container control of the host's Docker daemon and therefore the host. Mount /proc from the host and you can write to /proc/sys/kernel/core_pattern to gain code execution on the next core dump.

Network and PID namespace sharing

--net=host gives the container the host's network stack, including localhost services
--pid=host lets you see and signal host processes
--ipc=host shares System V IPC and can expose shared memory secrets
--userns=host disables user namespace remapping if it was even configured

If your container does not need the host's Docker socket, do not mount the host's Docker socket. There is no nuance.

Detection

Falco rules for unexpected execve, mount, or modprobe inside containers
Audit daemon on the host catching new processes outside the container PID tree
EDR with container awareness to correlate container ID with host activity
Image scanning that flags Dockerfiles with --privileged or unsafe mounts in CI

Remediation

Forbid --privileged at the orchestrator level, no exceptions for general workloads
Drop all capabilities by default and add only what the workload requires
Enforce read-only root filesystem and no_new_privileges on every container
Use user namespace remapping so root in the container is unprivileged on the host
Wrap Docker with rootless mode where possible, or move to Podman for hardened defaults

30%

of container fleets we audit have at least one privileged workload in production

1 line

of YAML separates a hardened deployment from full host compromise

Closing

Container escape is rarely glamorous. It is usually a developer who added --privileged six months ago to work around a permissions problem and never came back. The fix is policy enforcement at the orchestrator, not vigilance at the developer level.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.