---
name: cloud-optimization
description: Manage cloud infrastructure optimization including cost optimization, resource rightsizing, reserved instance management, spot instance utilization, cloud architecture review, multi-cloud strategy, cloud security posture, and FinOps practices. Use when optimizing cloud costs, reviewing cloud architecture, managing reserved instances, or implementing FinOps practices. Triggers on phrases like "cloud optimization", "cloud cost", "FinOps", "reserved instances", "spot instances", "rightsizing", "cloud architecture", "multi-cloud", "cloud security posture", "CSPM", "cloud waste", "idle resources", "cloud budget", "cloud governance", "cloud tagging", "cost allocation", "cloud compliance", "AWS", "Azure", "GCP", "cloud migration".
---

# Cloud Infrastructure Optimization

Minimize cloud costs while maximizing performance, security, and reliability through data-driven optimization.

## Cloud Cost Management

### FinOps Framework

```
FINOPS FRAMEWORK:
═════════════════

CLOUD PROVIDERS:
  Primary: AWS (65% of workload)
  Secondary: Azure (30% of workload)
  Tertiary: GCP (5% — specific ML workloads)
  Total monthly spend: ~$45K (January 2025)
  Annual run rate: ~$540K

COST BREAKDOWN (Monthly — January 2025):
  AWS ($29,250):
    ┌──────────────────────────┬──────────┬──────────┐
    │ Service                  │ Cost     │ % of AWS │
    ├──────────────────────────┼──────────┼──────────┤
    │ EC2 (compute)            │ $8,500   │ 29.1%    │
    │ RDS (databases)          │ $4,200   │ 14.4%    │
    │ S3 (storage)             │ $3,800   │ 13.0%    │
    │ Lambda (serverless)      │ $2,100   │ 7.2%     │
    │ EKS (Kubernetes)         │ $1,800   │ 6.2%     │
    │ CloudFront (CDN)         │ $1,200   │ 4.1%     │
    │ ElastiCache (Redis)      │ $950     │ 3.2%     │
    │ Data transfer            │ $1,100   │ 3.8%     │
    │ Monitoring/logging       │ $1,400   │ 4.8%     │
    │ Backup/snapshot          │ $650     │ 2.2%     │
    │ Other                    │ $3,350   │ 11.5%    │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ $29,250│ 100%   │
    └──────────────────────────┴──────────┴──────────┘

  Azure ($13,500):
    ┌──────────────────────────┬──────────┬──────────┐
    │ Service                  │ Cost     │ % of AZ  │
    ├──────────────────────────┼──────────┼──────────┤
    │ Virtual Machines         │ $4,200   │ 31.1%    │
    │ Azure SQL                │ $2,100   │ 15.6%    │
    │ Blob Storage             │ $1,500   │ 11.1%    │
    │ App Services             │ $1,200   │ 8.9%     │
    │ AKS (Kubernetes)         │ $1,000   │ 7.4%     │
    │ Data transfer            │ $450     │ 3.3%     │
    │ Other                    │ $3,050   │ 22.6%    │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ $13,500│ 100%   │
    └──────────────────────────┴──────────┴──────────┘

  GCP ($2,250):
    Compute Engine: $1,100 (48.9%)
    Cloud SQL: $550 (24.4%)
    Cloud Storage: $350 (15.6%)
    Other: $250 (11.1%)
  
  TOTAL CLOUD SPEND: $45,000/month

  Cost trend (6 months):
    August: $42K → Sep: $44K → Oct: $46K → Nov: $47K → Dec: $45K → Jan: $45K
    Status: Stabilized (growth rate: <5% month-over-month)

FINOPS PILLARS:
  1. INFORM (visibility):
     - Cost allocation tags: 95% coverage (target: 100%)
     - Showback: Department-level cost reports (weekly)
     - Chargeback: None (internal — showback only)
     - Budget alerts: 80%, 90%, 100% (email + Teams)
     - Anomaly detection: AWS Cost Anomaly + Azure Cost Management
  
  2. OPTIMIZE (efficiency):
     - Rightsizing: Monthly review (savings: $3,200/month)
     - Reserved instances: 65% coverage (savings: $8,500/month)
     - Spot instances: 30% of dev/test (savings: $2,100/month)
     - Idle resource cleanup: Weekly (savings: $800/month)
     - Storage tiering: Auto-archive (savings: $1,200/month)
     - Total monthly savings: $15,800 (35% of gross cost)
  
  3. OPERATE (governance):
     - Budget approval: Monthly (Finance + IT leadership)
     - Spend policy: Max $50K/month (alert at $40K)
     - Provisioning approval: >$500/month (manager approval)
     - Tagging policy: Mandatory (department, project, environment)
     - Compliance: Quarterly review (SOC 2, ISO 27001)

COST ALLOCATION (Department showback):
  ┌──────────────────────────┬──────────┬──────────┐
  │ Department               │ Cost     │ % of Total│
  ├──────────────────────────┼──────────┼──────────┤
  │ Engineering              │ $18,000  │ 40.0%    │
  │ Data/ML                  │ $7,500   │ 16.7%    │
  │ Sales/CRM                │ $5,400   │ 12.0%    │
  │ Marketing                │ $3,600   │ 8.0%     │
  │ Finance/ERP              │ $3,150   │ 7.0%     │
  │ HR                       │ $1,350   │ 3.0%     │
  │ Operations               │ $2,250   │ 5.0%     │
  │ Shared/Platform          │ $3,750   │ 8.3%     │
  │ ────────────────────── │ ────── │ ────── │
  │ TOTAL                  │ $45,000│ 100%   │
  └──────────────────────────┴──────────┴──────────┘

  Reporting: Weekly (department leads) + Monthly (leadership)
  Trending: Tracked monthly (YoY comparison)
  Budget adherence: 98% on-target (2 departments slightly over — flagged)
```

## Resource Optimization

### Rightsizing & Efficiency

```
RESOURCE OPTIMIZATION:
══════════════════════

RIGHTSIZING PROGRAM:
  Cadence: Monthly review (automated recommendation + manual approval)
  Tools: AWS Compute Optimizer + Azure Advisor
  Coverage: 100% of EC2 + Azure VMs (75 instances total)
  
  January 2025 recommendations:
    ┌──────────────────────────┬──────────┬──────────┐
    │ Recommendation           │ Count    │ Savings   │
    ├──────────────────────────┼──────────┼──────────┤
    │ Downsize (over-provisioned)│ 12      │ $1,800   │
    │ Uprsize (under-provisioned)│ 3       │ $0*      │
    │ Reserved instance (on-demand)│ 18    │ $4,200   │
    │ Spot instance (dev/test) │ 8        │ $950     │
    │ Idle (terminate)         │ 5        │ $650     │
    │ Optimal (no change)      │ 29       │ —        │
    │ ────────────────────── │ ────── │ ─────── │
    │ TOTAL                  │ 75     │ $7,600   │
    └──────────────────────────┴──────────┴──────────┘
  
  *Upsize required for performance (cost-neutral or slight increase)
  
  Implementation:
    Rightsized: 10/12 (2 deferred — maintenance window)
    Reserved: 15/18 (3 deferred — workload uncertainty)
    Spot: 6/8 (2 deferred — workload stability concern)
    Terminated: 5/5 (all confirmed idle — terminated)
  
  Savings realized: $6,200/month (January)
  Savings committed: $4,800/month (reserved instances — 1-year term)

RESERVED INSTANCE MANAGEMENT:
  AWS:
    RI coverage: 68% (of EC2 + RDS spend)
    RI terms: 1-year (55% of RI), 3-year (15% of RI)
    RI utilization: 94% (target: >90%) ✓
    Savings vs. on-demand: 35-40% (standard), 45-55% (heavy)
    Expiring RIs (next 90 days): 5 (renewal planned)
    RI optimization: Monthly exchange (instance type, tenancy)
  
  Azure:
    Savings Plans coverage: 62% (of compute spend)
    Savings Plan terms: 1-year (70%), 3-year (30%)
    Utilization: 91% (target: >90%) ✓
    Savings vs. on-demand: 25-30% (compute), 15-20% (all compute)
    Expiring plans (next 90 days): 3 (renewal planned)

SPOT INSTANCE UTILIZATION:
  Workloads on spot:
    Development environment: 30% of dev instances
    Testing/QA: 40% of test instances
    CI/CD runners: 60% of runners
    Batch processing: 80% of batch jobs
    Data analysis: 50% of analysis instances
  
  Spot interruption rate: 3.2% (AWS), 2.8% (Azure)
  Interruption handling:
    Graceful shutdown: 60-second warning (checkpoint + save)
    Auto-recovery: Launch replacement (same AZ or different AZ)
    Data persistence: EBS/Snapshot (state saved before shutdown)
    Impact: Minimal (stateless workloads + checkpoint)
  
  Savings: 60-70% vs. on-demand (spot pricing)
  Monthly savings: ~$2,100 (spot optimization)

STORAGE OPTIMIZATION:
  AWS S3:
    Total storage: 18 TB
    ┌──────────────────────────┬──────────┬──────────┐
    │ Storage Tier             │ Volume   │ Cost/Mo  │
    ├──────────────────────────┼──────────┼──────────┤
    │ S3 Standard              │ 8 TB     │ $232     │
    │ S3 Intelligent-Tiering   │ 5 TB     │ $165     │
    │ S3 Standard-IA           │ 3 TB     │ $60      │
    │ S3 Glacier               │ 1.5 TB   │ $15      │
    │ S3 Glacier Deep Archive  │ 0.5 TB   │ $3       │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ 18 TB  │ $475     │
    └──────────────────────────┴──────────┴──────────┘
  
    Lifecycle policies:
      30 days: Standard → Intelligent-Tiering
      90 days: → Standard-IA
      180 days: → Glacier
      365 days: → Glacier Deep Archive
      730 days: → Delete (if no compliance requirement)
  
    Savings: $1,200/month (vs. all-standard tier)
    Compliance: 15 TB (retention override — legal hold)
  
  Azure Blob:
    Total storage: 8 TB
    Tiering: Hot (4 TB), Cool (3 TB), Archive (1 TB)
    Lifecycle: Auto-transition (30/90/180 day policy)
    Savings: $450/month (vs. all-hot tier)

IDLE RESOURCE DETECTION:
  Weekly scan (automated):
    Unattached EBS volumes: 3 (250 GB — $25/month)
    Unassociated Elastic IPs: 2 ($8/month)
    Idle RDS instances: 1 (dev DB — $120/month)
    Idle load balancers: 1 ($16/month)
    Stopped EC2 instances: 2 ($0 — no charge when stopped)
    Unused NAT Gateways: 1 ($32/month)
  
  Total idle cost: $201/month
  Action: Terminate/archive (weekly cleanup — auto-approved)
  Prevention: Auto-shutdown (off-hours, dev/test)

WASTE REDUCTION SUMMARY:
  Monthly waste identified: ~$3,500
  Monthly waste eliminated: ~$2,800
  Remaining waste: ~$700 (under review)
  Waste rate: 7.8% of total spend (target: <5%)
  Trend: Improving (March: 12% → Dec: 8% → Jan: 7.8%)
```

## Cloud Security Posture

### CSPM & Governance

```
CLOUD SECURITY POSTURE MANAGEMENT (CSPM):
═══════════════════════════════════════════

CSPM TOOLS:
  AWS: AWS Security Hub + AWS Config + AWS Trusted Advisor
  Azure: Microsoft Defender for Cloud
  GCP: Security Command Center
  Cross-cloud: Wiz (consolidated view)

SECURITY POSTURE SCORE:
  AWS: 92/100 (target: >90%) ✓
  Azure: 89/100 (target: >90%) — 1 point short
  GCP: 95/100 (target: >90%) ✓
  Overall: 91/100 (improving trend)

COMPLIANCE FRAMEWORKS (Cloud):
  ┌──────────────────────────┬──────────┬──────────┬──────────┐
  │ Framework                │ AWS      │ Azure    │ GCP      │
  ├──────────────────────────┼──────────┼──────────┼──────────┤
  │ CIS Foundations          │ 95%      │ 92%      │ 97%      │
  │ SOC 2                    │ 100%     │ 98%      │ 100%     │
  │ ISO 27001               │ 98%      │ 95%      │ 99%      │
  │ NIST 800-53             │ 92%      │ 90%      │ 95%      │
  │ PCI DSS (cloud scope)   │ 100%     │ N/A      │ N/A      │
  │ ────────────────────── │ ────── │ ────── │ ────── │
  │ Average                │ 97%    │ 92%    │ 98%    │
  └──────────────────────────┴──────────┴──────────┴──────────┘

  Gap remediation:
    Azure CIS gap: 2 findings (network config — remediation in progress)
    Azure ISO gap: 3 findings (logging config — remediation in progress)
    NIST gap: 5 findings (encryption, access — planned for February)
  
  Target: 95%+ across all frameworks (all providers)

CLOUD CONFIGURATION STANDARDS:
  Automated enforcement (AWS Config + Azure Policy):
    1. Storage encryption: 100% (AES-256, KMS managed)
    2. Network encryption: 100% (TLS 1.2+, no unencrypted endpoints)
    3. Public access: 0 S3 buckets publicly accessible (enforced)
    4. Security groups: No 0.0.0.0/28 on sensitive ports (22, 3389)
    5. MFA: Required for all root/admin accounts (enforced)
    6. Logging: Enabled for all services (CloudTrail, Azure Activity Log)
    7. Tagging: Mandatory tags (department, environment, owner)
    8. Backup: Enabled for all production databases (automated)
    9. VPC flow logs: Enabled (all VPCs)
    10. IAM: No inline policies (managed policies only)
  
  Compliance rate: 96% (automated checks)
  Non-compliant: 12 resources (remediation in progress)
  Auto-remediation: 8 of 10 rules (auto-correct)

CLOUD GOVERNANCE:
  Guardrails (prevention):
    - Max instance type: m5.2xlarge (larger requires approval)
    - Max monthly spend per project: $5K (budget alert at $4K)
    - Region restriction: us-east-1, us-west-2, eastus, westeurope
    - No production deletion: Delete protection (Terraform + cloud)
    - No public IP: Private endpoints (NAT, VPC endpoints)
    - No root usage: Root account disabled (IAM only)
  
  Monitoring (detection):
    - Cost anomaly: AWS Cost Anomaly + Azure Cost Alerts
    - Security finding: Security Hub + Defender (real-time)
    - Configuration drift: AWS Config + Azure Policy (hourly)
    - Usage spike: CloudWatch + Azure Monitor (15-minute)
  
  Review cadence:
    Weekly: Cost + security (automated report)
    Monthly: Governance review (IT + Finance + Security)
    Quarterly: Cloud strategy review (architecture, multi-cloud)
    Annually: Cloud provider evaluation (negotiation, migration)

MULTI-CLOUD STRATEGY:
  Workload distribution:
    AWS (65%): Core applications, databases, storage, CDN
    Azure (30%): Microsoft ecosystem (365, AD, SQL Server), HR/Finance
    GCP (5%): ML/AI workloads (specific models — TensorFlow)
  
  Multi-cloud benefits:
    - Best-of-breed (per workload)
    - Vendor diversification (risk reduction)
    - Cost optimization (price comparison)
    - Compliance (data residency, regulatory)
  
  Multi-cloud challenges:
    - Complexity (multiple consoles, APIs)
    - Skill requirements (AWS + Azure + GCP)
    - Cost visibility (consolidated billing)
    - Security consistency (unified policy)
  
  Mitigation:
    - Terraform (unified IaC across providers)
    - Wiz (unified security view)
    - CloudHealth (unified cost view)
    - Training (multi-cloud certification)
```

## Output

### Cloud Optimization Dashboard

```
CLOUD OPTIMIZATION DASHBOARD — Jan 2025
════════════════════════════════════

Cost Overview:
  Monthly spend: $45,000
  Annual run rate: ~$540K
  Cost trend: Stabilized (<5% MoM growth)
  Budget adherence: 98% on-target
  
  AWS: $29,250 (65%)
  Azure: $13,500 (30%)
  GCP: $2,250 (5%)

Optimization:
  Monthly savings: $15,800 (35% of gross)
  RI/SP coverage: 65% (AWS), 62% (Azure)
  Spot utilization: 30-60% (dev/test/CI/CD)
  Rightsizing: $3,200/month savings
  Storage tiering: $1,650/month savings
  Waste rate: 7.8% (target: <5%)

Efficiency:
  Instance utilization: 55-65% (CPU), 60-70% (memory)
  Storage optimization: 3-tier lifecycle (auto)
  Idle resources: $201/month (weekly cleanup)
  Auto-shutdown: Dev/test (off-hours)

Security:
  CSPM score: 91/100 (AWS: 92, Azure: 89, GCP: 95)
  Compliance: 97% (CIS), 99% (SOC 2), 97% (ISO 27001)
  Config standards: 96% compliant
  Non-compliant resources: 12 (remediation in progress)

Governance:
  Guardrails: 10 rules (auto-enforce)
  Budget alerts: 80%, 90%, 100%
  Tagging: 95% coverage (target: 100%)
  Region: 4 approved regions (all workloads)
  Root access: Disabled (IAM only)

Actions:
  1. Improve tagging (95% → 100% coverage)
  2. Reduce waste rate (7.8% → <5%)
  3. Azure CSPM improvement (89 → 90+)
  4. RI renewal (5 expiring — next 90 days)
  5. Cloud strategy review (quarterly — Q1)
```

## Integration Points

- Cloud providers (AWS, Azure, GCP): Native services, billing, security
- Cost management (CloudHealth, AWS Cost Explorer, Azure Cost Mgmt): Visibility
- CSPM tools (Wiz, Prisma Cloud, Security Hub): Security posture
- IaC tools (Terraform, CloudFormation): Infrastructure provisioning
- FinOps platforms (Cloudability, CloudHealth, Apptio): Cost optimization
- Monitoring (CloudWatch, Azure Monitor, GCP Monitoring): Resource metrics
- SIEM (Sentinel, Splunk): Cloud security logs
- ITSM (ServiceNow): Change management, cost approval
- Budgeting/Finance (NetSuite, QuickBooks): Chargeback/showback
- Configuration management (AWS Config, Azure Policy): Compliance
- Tagging/asset tools: Resource categorization, cost allocation
- Communication (Teams, Slack): Budget alerts, anomaly notifications

## Edge Cases

- **Cost spike (unexpected)**: Budget alert; anomaly detection; root cause (resource leak, misconfiguration); immediate action
- **Reserved instance expiry**: Renewal planning; workload reassessment; term optimization; savings projection
- **Spot interruption (production-adjacent)**: Graceful shutdown; checkpoint; failover; workload redesign
- **Cloud provider outage**: Multi-region failover; DNS failover; backup activation; customer communication
- **Compliance finding (CSPM)**: Auto-remediation; manual review; exception request; policy update
- **Budget overrun (department)**: Showback report; department notification; spending freeze; approval override
- **Region restriction (new compliance)**: Workload migration; data transfer; DNS update; validation
- **Storage growth (unbounded)**: Lifecycle policy; alert threshold; cleanup; archival
- **Multi-cloud complexity**: Unified tooling; skill development; governance consistency
- **Vendor lock-in concern**: Abstraction layer; portable architecture; exit strategy; negotiation leverage
