Scenario
In a multi-tenant Kubernetes cluster, development teams deploy workloads, but the security team needs to enforce pod security without breaking existing applications. Many pods run as root, use privileged containers, or add dangerous capabilities. Security audits reveal numerous violations, and manual reviews are resisted.
Symptoms
- Security audit reports: many pods running as root, with privileged access, or with dangerous capabilities.
- Developers resist manual security reviews; PodSecurityPolicy (PSP) is deprecated and lacks automation.
Diagnosis
Use kubectl to inspect pod security contexts:
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name} {.spec.containers[*].securityContext.privileged}{.spec.containers[*].securityContext.runAsUser}{.spec.containers[*].securityContext.capabilities.add}{"\n"}{end}' | head -20
Check for privileged pods:
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.containers[].securityContext.privileged == true) | {namespace: .metadata.namespace, name: .metadata.name}'
Identify namespaces that need hardening.
Commands (Implementation)
- Install OPA Gatekeeper:
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
- Deploy Pod Security Standards constraint templates (Baseline and Restricted):
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8spspbaseline
spec:
crd:
spec:
names:
kind: K8sPSPBaseline
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8spspbaseline
violation[{"msg": msg}] {
# Implement Baseline constraint logic
msg := sprintf("Pod %v in namespace %v violates Baseline Pod Security Standard", [input.review.object.metadata.name, input.review.object.metadata.namespace])
}
- Apply constraints in audit mode:
kubectl apply -f constraint.yaml --dry-run=server
- Enforce on specific namespaces:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPBaseline
metadata:
name: pod-security-baseline
spec:
match:
namespaces: ["prod", "staging"]
enforcementAction: dryrun
Then change to warn or deny.
Risk Controls
- Use
dryrunmode first to collect violations. - Exclude namespaces running critical DaemonSets or system components.
- Add label
security.opsglobal.io/exempt=trueto skip checks. - Test thoroughly in non-production environments.
Rollback
Delete constraint templates and instances, or revert to audit mode:
kubectl delete constrainttemplate k8spspbaseline
kubectl delete constraint pod-security-baseline
Verification
Check Gatekeeper violation status:
kubectl get constrainttemplates
kubectl get K8sPSPBaseline -o yaml
Test with a sample pod:
kubectl run nginx --image=nginx --restart=Never --dry-run=server -o yaml | kubectl apply -f -
If violating, it should be rejected or warned.
When to Submit an OpsGlobal Ticket
- Need consistent enforcement across multiple clusters.
- Custom constraint template development.
- Performance tuning (Gatekeeper latency).
- Handling large-scale violations with urgent remediation.
Use cases
Useful for teams handling Security issues and needing a clear troubleshooting and delivery workflow.
Problem background
Learn how to enforce Pod Security Standards (PSS) using OPA Gatekeeper to harden your Kubernetes clusters against common exploits. This guide covers scenario, diagnosis, implementation, rollback, and verification.
Troubleshooting steps
Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.
Command examples
Replace sample resource names with real values and store passwords, tokens and keys in environment variables.
Risks
Before production changes, confirm backups, access boundaries, change windows and rollback paths.
Rollback plan
Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.
Deliverables
Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.
Need help with a similar technical issue?
If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.