Scenario
A production Kubernetes cluster for a financial client shows unauthorized resource access. The SRE team must harden security immediately.
Symptoms
- Non-admin users can list cluster secrets
- Outbound connections from suspicious IPs
- Leaked kubeconfig allows attacker to run commands
Diagnosis
- Check RBAC bindings:
kubectl get clusterrolebinding,rolebinding -A -o wide - Audit network policies:
kubectl get networkpolicy -A - Verify secret storage:
kubectl get secrets -A | grep -v default-token
Commands
Tighten RBAC
kubectl create clusterrole restrict-secrets --verb=get,list,watch --resource=secrets --resource-name= # only specific secrets
kubectl create clusterrolebinding restrict-secrets-binding --clusterrole=restrict-secrets --user=developer
Enable Network Policy
# default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
kubectl apply -f default-deny.yaml
Rotate Secrets
kubectl delete secret my-app-secret
kubectl create secret generic my-app-secret --from-literal=password=$(openssl rand -base64 32)
Risk Controls
- Test network policies in a canary namespace first
- Snapshot RBAC configurations before changes:
kubectl get clusterrolebinding -o yaml > rbac-backup.yaml - Ensure apps support hot-reload for secrets or coordinate restarts
Rollback
kubectl delete clusterrole restrict-secrets
kubectl delete clusterrolebinding restrict-secrets-binding
kubectl delete networkpolicy default-deny-all
kubectl apply -f rbac-backup.yaml
Verification
kubectl auth can-i list secrets --as=developer # should return no
kubectl exec test-pod -- curl -I http://evil.com # should timeout
kubectl get secret my-app-secret -o jsonpath='{.data.password}' | base64 -d | wc -c # should be 32
When to Submit an OpsGlobal Ticket
- Security vulnerability (e.g., CVE) affecting running components requiring hotfix
- Internal audit reveals critical misconfiguration beyond team capacity
- Widespread service outage after hardening that cannot be rolled back
- Need external experts for red-blue exercises or compliance review
Use cases
Useful for teams handling Security issues and needing a clear troubleshooting and delivery workflow.
Problem background
Deep dive into securing Kubernetes environments with RBAC, network policies, and secrets management, including diagnostic and rollback steps.
Troubleshooting steps
Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.
Command examples
Replace sample resource names with real values and store passwords, tokens and keys in environment variables.
Risks
Before production changes, confirm backups, access boundaries, change windows and rollback paths.
Rollback plan
Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.
Deliverables
Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.
Need help with a similar technical issue?
If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.