Docker Container Escape Techniques: Privileged, Capabilities, Mount Abuse
Cybersecurity
Container escape primitives that still work in 2024 against Docker hosts, covering privileged mode, capability abuse, and dangerous mount configurations.
By Arjun Raghavan, Security & Systems Lead, BIPI · December 13, 2024 · 11 min read
Container escape is less about exotic kernel bugs and more about the configuration the developer accepted. Docker run flags that look harmless on a laptop become root on the host in production. This post catalogues the escape primitives we exercise during BIPI container security assessments.
Triage the container first
- capsh --print to see effective capabilities inside the container
- mount and /proc/self/status for namespace and mount visibility
- ls /.dockerenv and /proc/1/cgroup to confirm container identity
- deepce or amicontained as automated enumeration
Privileged mode
docker run --privileged disables nearly every isolation feature: full capabilities, all devices accessible, no seccomp, no AppArmor. From a privileged container, mounting the host filesystem is a one-liner. mkdir /tmp/host && mount /dev/sda1 /tmp/host gives you /tmp/host as a read-write view of the host. Drop a cron job and you are root on the host.
Dangerous capabilities
- CAP_SYS_ADMIN is nearly equivalent to root; allows mount, unshare, and a long list of syscalls
- CAP_SYS_PTRACE lets you attach to host processes if PID namespaces are shared
- CAP_SYS_MODULE lets you load kernel modules, full host compromise
- CAP_DAC_READ_SEARCH lets you read any file via open_by_handle_at
- CAP_NET_ADMIN combined with host network is a pivot platform
Mount abuse
-v /:/host is the obvious one and rare in production. The interesting variants are -v /var/run/docker.sock:/var/run/docker.sock, which gives the container control of the host's Docker daemon and therefore the host. Mount /proc from the host and you can write to /proc/sys/kernel/core_pattern to gain code execution on the next core dump.
Network and PID namespace sharing
- --net=host gives the container the host's network stack, including localhost services
- --pid=host lets you see and signal host processes
- --ipc=host shares System V IPC and can expose shared memory secrets
- --userns=host disables user namespace remapping if it was even configured
If your container does not need the host's Docker socket, do not mount the host's Docker socket. There is no nuance.
Detection
- Falco rules for unexpected execve, mount, or modprobe inside containers
- Audit daemon on the host catching new processes outside the container PID tree
- EDR with container awareness to correlate container ID with host activity
- Image scanning that flags Dockerfiles with --privileged or unsafe mounts in CI
Remediation
- Forbid --privileged at the orchestrator level, no exceptions for general workloads
- Drop all capabilities by default and add only what the workload requires
- Enforce read-only root filesystem and no_new_privileges on every container
- Use user namespace remapping so root in the container is unprivileged on the host
- Wrap Docker with rootless mode where possible, or move to Podman for hardened defaults
Closing
Container escape is rarely glamorous. It is usually a developer who added --privileged six months ago to work around a permissions problem and never came back. The fix is policy enforcement at the orchestrator, not vigilance at the developer level.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.