BIPI

Container Escape Techniques in 2025: runc CVEs, Privileged Container Abuses and seccomp Bypass

Cloud Security

Container escapes are not theoretical. runc CVEs in 2025 gave attackers host root from standard container workloads. This post maps the current container escape landscape: unpatched runc vulnerabilities, privileged container host mount abuses, and techniques that bypass seccomp profiles without triggering standard detections.

By Arjun Raghavan, Security & Systems Lead, BIPI · September 15, 2025 · 11 min read

#container-security#kubernetes#runc#escape-techniques#red-team

The promise of container isolation is that a compromised process inside a container cannot affect the host or adjacent containers. In practice, this promise has been broken repeatedly. 2025 produced multiple critical runc and containerd CVEs that allowed attackers with the ability to run a container — not necessarily as root — to escape to the host. Understanding the current escape landscape is essential for anyone defending containerized workloads.

Container escapes fall into three broad categories: vulnerability-based escapes exploiting bugs in the container runtime, configuration-based escapes abusing permissive container settings, and kernel-based escapes targeting shared kernel primitives. Each category has different mitigations and detection strategies.

critical container runtime CVEs (CVSS 9.0+) affecting runc and containerd published in H1 2025

31%

of Kubernetes clusters scanned in production have at least one privileged container running in a non-system namespace

6 hrs

median time between a runc CVE publication and working PoC availability in 2025

A privileged container with host process namespace access is not a container. It is a process running directly on the host with a thin wrapper that provides false confidence.

runc CVEs in 2025 — what changed

runc is the container runtime underneath Docker, containerd, and most Kubernetes deployments. CVEs in runc are particularly dangerous because they affect every container on every node, not just a specific workload. The 2024 and 2025 runc CVEs followed a pattern first seen in CVE-2019-5736: file descriptor leaks and /proc/self/exe replacement attacks that allow a container process to overwrite the runc binary on the host.

The mitigation is straightforward: patch. Kubernetes node images with runc versions below the fixed release are vulnerable regardless of Pod Security Admission or seccomp policies. Establish a node patching SLA that treats critical container runtime CVEs as P0 incidents with a 48-hour patching window. Automated node pool rolling upgrades in managed Kubernetes should be configured to apply security patches automatically.

Privileged container escapes

Privileged containers run with all Linux capabilities, no seccomp filtering, and unrestricted access to host devices. A privileged container can mount the host filesystem, load kernel modules, read /proc of any process on the host, and access the container runtime socket. The escape from a privileged container to full host compromise is a series of well-documented commands, not a vulnerability.

Mount host filesystem: mount /dev/sda1 /mnt and chroot /mnt gives root on the host filesystem.
Container runtime socket: access to /var/run/docker.sock allows launching new containers with host mounts.
Kernel module loading: insmod a malicious LKM from inside the container — the module runs in host kernel context.
nsenter with --target 1 and namespace flags enters the host namespaces using the init process as the target.

seccomp bypass techniques

The container runtime default seccomp profile blocks approximately 40 percent of Linux syscalls. It does not block all dangerous syscalls — ptrace, mount, clone, and several others remain available by default because legitimate workloads use them. Attackers can use unrestricted syscalls to perform namespace manipulation and access /proc pseudo-files to read host kernel information.

Custom seccomp profiles that follow the principle of least-privilege syscall access are more effective than the runtime default. Tools like Inspektor Gadget can profile running workloads to determine the exact syscall set they use, generating a custom seccomp profile that denies everything else.

Detection — what to look for

Process spawning /bin/sh or bash inside a container that should not have a shell such as a web server or database.
Mount syscall inside a container — visible via eBPF or audit daemon.
Access to /proc/1/maps, /proc/1/mem, or /proc/sysrq-trigger from a container process.
File access to /var/run/docker.sock or /run/containerd/containerd.sock.
nsenter or unshare execution inside a container.
Network connections from a container to the node metadata service at 169.254.169.254.

Hardening checklist

Patch runc and containerd within 48 hours of a critical CVE — use managed node groups with auto-upgrade.
Enforce Pod Security Admission restricted profile on all non-system namespaces — blocks privileged containers.
Apply custom seccomp profiles to high-value workloads using the SeccompProfile Kubernetes resource.
Deploy Falco or Tetragon with rules covering shell spawn, mount, and /proc access inside containers.
Audit running containers for the privileged flag across all namespaces.
Restrict access to container runtime sockets — never mount docker.sock in application containers.

Closing

Container security is not a solved problem in 2025. Runtime CVEs, privileged containers, and seccomp gaps continue to provide attackers with reliable escape paths. The defense is layered: patch runtimes aggressively, enforce PSA to prevent privileged containers from scheduling, deploy custom seccomp profiles for sensitive workloads, and use eBPF-based runtime detection to catch behavior that policy controls miss. No single control is sufficient; the depth of the defense stack is the measure of your protection.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.