BIPI
BIPI

EKS Security in 2024: IRSA, Pod Identity, and the auth-config Trap

Cloud Security

EKS clusters fail audits the same way every time. The fixes are well understood but the rollout order matters, and the new EKS Pod Identity feature finally removes one long-standing pain point.

By Arjun Raghavan, Security & Systems Lead, BIPI · February 7, 2024 · 8 min read

#eks#kubernetes#aws#cloud-security

Every EKS cluster we review has the same five problems. Public API endpoint, missing control plane logging, aws-auth ConfigMap that nobody understands, workloads using node IAM roles, and a kube-system namespace full of admin-bound service accounts. The fixes are well documented, but the order matters and the new EKS Pod Identity feature (GA in late 2023) changes the playbook for IAM.

Private control plane endpoint, not just private

EKS gives you three options: public, public-and-private, or private only. The default is public, accessible from anywhere on the internet with a valid bearer token. Even with strong IAM authentication, exposing the API server publicly is unnecessary risk.

Private-only is the right choice if your CI runners and developer machines can reach the cluster via VPN or Transit Gateway. Public-and-private with restricted CIDR blocks is the pragmatic middle ground. Public-only is acceptable only for short-lived clusters or learning environments.

Control plane logging is not on by default

Five log types: api, audit, authenticator, controllerManager, scheduler. None are enabled by default. At minimum, enable api, audit, and authenticator. Without audit logs, incident response on an EKS cluster is guesswork. The logs go to CloudWatch and cost money, so route them through subscription filters to an S3 bucket and apply a 30-day CloudWatch retention to keep the bill reasonable.

IRSA to Pod Identity migration

IRSA (IAM Roles for Service Accounts) has been the standard way to give pods AWS permissions since 2019. It works by creating an OIDC provider per cluster and using a service account annotation to map to an IAM role with a trust policy that scopes to the OIDC subject claim.

It works but it is operationally heavy. Each cluster needs its own OIDC provider. IAM role trust policies have to reference the cluster's OIDC issuer URL. Cross-account access requires extra hops. The trust relationship breaks the moment you recreate the cluster.

EKS Pod Identity (GA November 2023) replaces all of that with a Pod Identity Agent DaemonSet and an association API. No OIDC provider needed. The trust policy is simple and uses pods.eks.amazonaws.com as the principal. Cross-account access works without contortion. We are migrating greenfield workloads to Pod Identity and leaving existing IRSA setups alone until they need changes.

aws-auth ConfigMap and access entries

The aws-auth ConfigMap in kube-system is how IAM identities have historically mapped to Kubernetes RBAC. It is also the most common source of EKS outages we see. One typo locks everyone out of the cluster. There is no console to fix it. Recovery requires recreating the cluster creator IAM identity (the only one with implicit admin) or using AWS Support.

EKS Access Entries (also GA late 2023) replace aws-auth with a proper API. IAM principals get mapped to Kubernetes groups via aws eks create-access-entry. Mistakes are recoverable. We enable Access Entries on all new clusters and recommend migrating off aws-auth on existing ones.

Workload identity hygiene

The pattern we enforce on every cluster:

  1. No pods use the node IAM role for AWS API calls. Block 169.254.169.254 from pods unless explicitly needed.
  2. Every workload that needs AWS access has its own service account and its own IAM role with least-privilege.
  3. Default service accounts in workload namespaces have automountServiceAccountToken: false.
  4. kube-system service accounts are reviewed quarterly. Any new ClusterRoleBinding to cluster-admin gets flagged.

Network policy is non-optional

EKS supports network policy via the VPC CNI (since version 1.14 in late 2023) or Calico. Without network policy, every pod can talk to every other pod and every node IP. With basic deny-all-ingress defaults and per-namespace allow rules, a single compromised pod cannot pivot.

We use Cilium on clusters that need L7 policy or service mesh-like features. The standard VPC CNI network policy is fine for L3/L4.

ENI and SG reality

Every pod gets a real ENI (or shares one in prefix mode). Security groups apply at the ENI level. If you need per-pod security groups, enable Security Groups for Pods, but understand that it consumes ENI quota faster and only works on a subset of instance types. For most workloads, namespace-level network policies are simpler than per-pod security groups.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.