---
name: container-orchestration
description: Manage containerized workloads using Kubernetes, Docker Swarm, or similar orchestration platforms. Handle deployment, scaling, service discovery, configuration management, and cluster operations. Use when deploying microservices, managing K8s clusters, configuring Helm charts, troubleshooting container issues, or optimizing container resource allocation. Triggers on phrases like "Kubernetes", "K8s", "container orchestration", "Helm", "Docker", "pod", "deployment", "service mesh", "namespace", "cluster management", "container security", "auto-scaling", "pod disruption", "node pool", "container registry".
---

# Container Orchestration

Manage containerized workloads across Kubernetes, Docker, and orchestration platforms.

## Workflow

### 1. Cluster Architecture Design

```
KUBERNETES CLUSTER ARCHITECTURE
═══════════════════════════════════════

CONTROL PLANE:
  → API Server (3 instances for HA)
  → etcd (3-5 instances)
  → Controller Manager
  → Scheduler
  → Cloud Controller Manager

WORKER NODES:
  → Node pools by workload type:
     · General purpose: 4x t3.large (CPU: 24 vCPU, RAM: 48GB)
     · Memory-optimized: 2x r5.xlarge (CPU: 8, RAM: 32GB)
     · GPU: 2x p3.2xlarge (for ML workloads)
     · Spot instances: 4x (for fault-tolerant workloads)

NETWORKING:
  → CNI: Calico (network policy support)
  → Ingress: NGINX Ingress Controller (or ALB Ingress)
  → Service mesh: Istio (production), Linkerd (staging)
  → DNS: CoreDNS with custom upstream

STORAGE:
  → Persistent storage: EBS CSI (AWS) / PD CSI (GCP)
  → Distributed storage: Longhorn / Rook-Ceph
  → Storage classes: gp3-ssd, io2-perf, standard-hdd
```

### 2. Application Deployment

```
DEPLOYMENT MANIFEST (Helm Chart):
═══════════════════════════════════════

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
  labels:
    app: api-gateway
    version: v2.1.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: api-gateway
        image: registry.internal/api-gateway:v2.1.0
        ports:
        - containerPort: 8080
        resources:
          requests: { cpu: 250m, memory: 256Mi }
          limits: { cpu: 1000m, memory: 1Gi }
        livenessProbe: { httpGet: { path: /health, port: 8080 }, initialDelay: 30 }
        readinessProbe: { httpGet: { path: /ready, port: 8080 }, period: 10 }
        envFrom:
        - configMapRef: { name: api-gateway-config }
        - secretRef: { name: api-gateway-secrets }

SERVICE AND INGRESS:
═══════════════════════════════════════

Service (ClusterIP for internal):
  → api-gateway-service: Port 8080 → 8080
  → api-gateway-metrics: Port 9090 → 9090

Ingress (external access):
  → Host: api.company.com
  → TLS: cert-manager (Let's Encrypt or managed cert)
  → Path routing: /api/* → api-gateway-service

Horizontal Pod Autoscaler:
  → Min replicas: 3
  → Max replicas: 12
  → Target CPU utilization: 70%
  → Target memory utilization: 80%
```

### 3. Resource Management & Scaling

```
RESOURCE ALLOCATION FRAMEWORK
═══════════════════════════════════════

WORKLOAD TIERS:
═══════════════════════════════════════

Tier 1 — Mission Critical (API, Database Proxy, Auth):
  → Pod disruption budget: maxUnavailable = 0
  → Anti-affinity: Spread across zones
  → Priority class: system-cluster-critical
  → Node selector: production-pool
  → Resource requests: conservative (50% of limit)

Tier 2 — Business Critical (Worker services, Queues):
  → Pod disruption budget: maxUnavailable = 1
  → Anti-affinity: Prefer different nodes
  → Priority class: high-priority
  → Node selector: general-pool
  → Resource requests: 70% of limit

Tier 3 — Non-Critical (Batch jobs, Analytics):
  → Pod disruption budget: minAvailable = 0
  → No affinity constraints
  → Priority class: low-priority
  → Node selector: spot-pool
  → Resource requests: 90% of limit

CLUSTER AUTO-SCALING:
═══════════════════════════════════════

Configured thresholds:
  → Scale-up: Triggered when pending pods for > 30 seconds
  → Scale-down: Cool-down period 10 minutes
  → Max nodes: 20 per availability zone
  → Min nodes: 2 per availability zone
  → Scale-up priority: General pool → GPU pool → Spot pool
  → Scale-down priority: Spot pool → GPU pool → General pool
```

### 4. Container Security

```
CONTAINER SECURITY BEST PRACTICES
═══════════════════════════════════════

IMAGE SECURITY:
═══════════════════════════════════════

  → Use minimal base images (distroless, alpine, slim)
  → Scan images in CI pipeline (Trivy, Clair, Snyk)
  → Pin image versions (never use :latest in production)
  → Sign images (Cosign, Notary)
  → Enable image admission controller (Kyverno, OPA)

RUNTIME SECURITY:
═══════════════════════════════════════

  → Run containers as non-root (securityContext.runAsNonRoot: true)
  → Drop all capabilities, add only needed
  → Read-only root filesystem (readOnlyRootFilesystem: true)
  → Set resource limits (prevent DoS)
  → Network policies: default-deny, allow only needed traffic
  → Pod Security Standards: enforce "restricted" profile
  → Runtime monitoring: Falco (system call monitoring)

SECRET MANAGEMENT:
═══════════════════════════════════════

  → Never embed secrets in images or manifests
  → Use external secret management:
     · AWS Secrets Manager / HashiCorp Vault
     · External Secrets Operator (auto-sync to K8s secrets)
     · Sealed Secrets (for GitOps workflows)
  → Rotate secrets regularly (90-day cycle)
  → Audit secret access (RBAC + audit logs)

NETWORK POLICIES:
═══════════════════════════════════════

  → Default deny all ingress/egress
  → Allow specific inter-service communication
  → Restrict egress to known endpoints
  → Monitor policy violations (Calico network policy logs)
```

### 5. Troubleshooting & Operations

```
CONTAINER TROUBLESHOOTING CHECKLIST
═══════════════════════════════════════

ISSUE: Pod Not Starting
═══════════════════════════════════════

Symptoms: CrashLoopBackOff, ImagePullBackOff, Pending

Diagnostic Steps:
  1. kubectl describe pod <pod> — Check events, conditions
  2. kubectl logs <pod> — Check container logs
  3. kubectl get events — Check cluster-level events
  4. Check resource quotas and limits
  5. Check image pull secrets
  6. Check node resources (kubectl describe node)
  7. Check node affinity/anti-affinity rules
  8. Check PV/PVC binding status

ISSUE: High Latency / Performance Degradation
═══════════════════════════════════════

Diagnostic Steps:
  1. Check pod resource usage (kubectl top pods)
  2. Check node resource usage (kubectl top nodes)
  3. Review HPA events and scaling history
  4. Check service mesh metrics (Istio telemetry)
  5. Analyze pod-to-pod network latency
  6. Check for OOMKill events
  7. Review application logs for errors/timeouts
  8. Check DNS resolution times

ISSUE: Failed Deployment Rollout
═══════════════════════════════════════

Diagnostic Steps:
  1. kubectl rollout status deployment/<name>
  2. kubectl rollout history deployment/<name>
  3. Check readiness probe failures
  4. kubectl rollout undo deployment/<name> (rollback)
  5. Review deployment strategy (RollingUpdate vs Recreate)
  6. Check maxUnavailable/maxSurge settings

CLUSTER HEALTH CHECK:
═══════════════════════════════════════

  → API server responsiveness: kubectl cluster-info
  → etcd health: etcdctl endpoint health
  → Node status: kubectl get nodes
  → Pod status: kubectl get pods --all-namespaces
  → Resource utilization: kubectl top nodes/pods
  → Event log: kubectl get events --sort-by=.metadata.creationTimestamp
  → Persistent volume status: kubectl get pv,pvc
  → Certificate expiry: kubectl get certificates
```

## Edge Cases

- **Multi-cluster management**: Use Rancher, Fleet, or ACM for cluster federation
- **Air-gapped environments**: Local registry mirror, offline Helm charts
- **Large clusters (500+ nodes)**: Consider KKP, GKE Autopilot, or EKS
- **GPU workloads**: NVIDIA device plugin, MPS for sharing
- **Stateful workloads**: Use StatefulSets with proper storage class

## Integration Points

- **Container registries**: ECR, GCR, Docker Hub, Harbor, ACR
- **CI/CD**: GitLab CI, GitHub Actions, ArgoCD, Flux
- **Monitoring**: Prometheus, Grafana, Datadog, New Relic
- **Service mesh**: Istio, Linkerd, Consul Connect
- **Security**: Trivy, Falco, OPA, Kyverno, Trivy
- **Cloud providers**: AWS EKS, GCP GKE, Azure AKS

## Output

### Container Operations Summary

```
CLUSTER STATUS — Production
═══════════════════════════════════════

Nodes: 14 (6 general, 2 memory, 2 GPU, 4 spot)
Pods: 142 running, 0 pending, 0 failed
Namespaces: 8 (prod: 4, staging: 2, tools: 1, system: 1)

Resource utilization:
  CPU: 45% of cluster capacity
  Memory: 62% of cluster capacity

Deployments: 28 (all healthy)
Services: 35 (24 ClusterIP, 8 NodePort, 3 LoadBalancer)
Ingress: 12 (all with TLS)

Security:
  Image scans: All passing (no critical CVEs)
  Network policies: Enforced (default-deny + 28 allow rules)
  Pod security: Restricted profile enforced

Cost: $12,400/month (spot savings: $3,200/month)
```
