Troubleshooting¶
Terraform State Lock¶
Symptom: Error acquiring the state lock
# Check who holds the lock
aws dynamodb scan --table-name <lock-table>
# Force unlock (use with caution)
tofu force-unlock <lock-id>
EFS Mount Failures¶
Symptom: Pods stuck in ContainerCreating with EFS mount errors
Verify EFS CSI driver is installed:
kubectl get pods -n kube-system | grep efs
Check security groups allow NFS (port 2049) from EKS nodes
Verify EFS mount targets exist in the correct subnets
PVC Stuck in Pending¶
Symptom: PersistentVolumeClaim is not bound
kubectl describe pvc <pvc-name>
Common causes:
StorageClass not found: ensure
efsStorageClass.enabled: trueEFS CSI driver not running
Wrong
fileSystemIdin StorageClass
Image Pull Errors¶
Symptom: ImagePullBackOff or ErrImagePull
For ECR:
Verify node IAM role has
AmazonEC2ContainerRegistryReadOnlyCheck images exist (repo names use the Terraform prefix, e.g.
ubtrace-production/ub-backend):# List all ECR repos to find the exact names aws ecr describe-repositories --query 'repositories[].repositoryName' --output table # Then check images in a specific repo aws ecr describe-images --repository-name ubtrace-<env>/ub-backend
Verify region matches between ECR and EKS
For private registries:
Verify
imagePullSecretsis configured in Helm valuesCheck secret exists:
kubectl get secret <secret-name>
RDS Connection Refused¶
Symptom: API pods crash with database connection errors
Verify security groups allow PostgreSQL (port 5432) from EKS nodes
Check credentials in SSM:
aws ssm get-parameter --name "/ubtrace/<env>/db/app/password"Test connectivity:
kubectl run pg-test --rm -it --image=postgres:16-alpine -- psql <connection-string>
Redis Auth / TLS Failures¶
Symptom: NOAUTH Authentication required or TLS handshake errors
ElastiCache with auth token requires TLS. Ensure
redis.external.tls: trueVerify auth token matches SSM value
Test:
kubectl run redis-test --rm -it --image=redis:7-alpine -- redis-cli -h <host> --tls -a <token> ping
Keycloak Redirect Loops¶
Symptom: Browser redirects in a loop after login
Verify
keycloak.hostnamematches the public URL exactlyCheck
oidc.issuerincludes/realms/ubtraceEnsure ALB health check path is
/health/ready(not/)Verify cookie domain matches (no cross-domain issues)