IT AI Skill

Cloud Optimization

Manage cloud infrastructure optimization including cost optimization, resource rightsizing, reserved instance management, spot instance utilization, cloud architecture review, multi-cloud strategy, cloud security posture, and FinOps practices. Use when opti...

Cloud Infrastructure Optimization

Minimize cloud costs while maximizing performance, security, and reliability through data-driven optimization.

Cloud Cost Management

FinOps Framework

FINOPS FRAMEWORK:
═════════════════

CLOUD PROVIDERS:
  Primary: AWS (65% of workload)
  Secondary: Azure (30% of workload)
  Tertiary: GCP (5% — specific ML workloads)
  Total monthly spend: ~$45K (January 2025)
  Annual run rate: ~$540K

COST BREAKDOWN (Monthly — January 2025):
  AWS ($29,250):
    ┌──────────────────────────┬──────────┬──────────┐
    │ Service                  │ Cost     │ % of AWS │
    ├──────────────────────────┼──────────┼──────────┤
    │ EC2 (compute)            │ $8,500   │ 29.1%    │
    │ RDS (databases)          │ $4,200   │ 14.4%    │
    │ S3 (storage)             │ $3,800   │ 13.0%    │
    │ Lambda (serverless)      │ $2,100   │ 7.2%     │
    │ EKS (Kubernetes)         │ $1,800   │ 6.2%     │
    │ CloudFront (CDN)         │ $1,200   │ 4.1%     │
    │ ElastiCache (Redis)      │ $950     │ 3.2%     │
    │ Data transfer            │ $1,100   │ 3.8%     │
    │ Monitoring/logging       │ $1,400   │ 4.8%     │
    │ Backup/snapshot          │ $650     │ 2.2%     │
    │ Other                    │ $3,350   │ 11.5%    │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ $29,250│ 100%   │
    └──────────────────────────┴──────────┴──────────┘

  Azure ($13,500):
    ┌──────────────────────────┬──────────┬──────────┐
    │ Service                  │ Cost     │ % of AZ  │
    ├──────────────────────────┼──────────┼──────────┤
    │ Virtual Machines         │ $4,200   │ 31.1%    │
    │ Azure SQL                │ $2,100   │ 15.6%    │
    │ Blob Storage             │ $1,500   │ 11.1%    │
    │ App Services             │ $1,200   │ 8.9%     │
    │ AKS (Kubernetes)         │ $1,000   │ 7.4%     │
    │ Data transfer            │ $450     │ 3.3%     │
    │ Other                    │ $3,050   │ 22.6%    │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ $13,500│ 100%   │
    └──────────────────────────┴──────────┴──────────┘

  GCP ($2,250):
    Compute Engine: $1,100 (48.9%)
    Cloud SQL: $550 (24.4%)
    Cloud Storage: $350 (15.6%)
    Other: $250 (11.1%)
  
  TOTAL CLOUD SPEND: $45,000/month

  Cost trend (6 months):
    August: $42K → Sep: $44K → Oct: $46K → Nov: $47K → Dec: $45K → Jan: $45K
    Status: Stabilized (growth rate: <5% month-over-month)

FINOPS PILLARS:
  1. INFORM (visibility):
     - Cost allocation tags: 95% coverage (target: 100%)
     - Showback: Department-level cost reports (weekly)
     - Chargeback: None (internal — showback only)
     - Budget alerts: 80%, 90%, 100% (email + Teams)
     - Anomaly detection: AWS Cost Anomaly + Azure Cost Management
  
  2. OPTIMIZE (efficiency):
     - Rightsizing: Monthly review (savings: $3,200/month)
     - Reserved instances: 65% coverage (savings: $8,500/month)
     - Spot instances: 30% of dev/test (savings: $2,100/month)
     - Idle resource cleanup: Weekly (savings: $800/month)
     - Storage tiering: Auto-archive (savings: $1,200/month)
     - Total monthly savings: $15,800 (35% of gross cost)
  
  3. OPERATE (governance):
     - Budget approval: Monthly (Finance + IT leadership)
     - Spend policy: Max $50K/month (alert at $40K)
     - Provisioning approval: >$500/month (manager approval)
     - Tagging policy: Mandatory (department, project, environment)
     - Compliance: Quarterly review (SOC 2, ISO 27001)

COST ALLOCATION (Department showback):
  ┌──────────────────────────┬──────────┬──────────┐
  │ Department               │ Cost     │ % of Total│
  ├──────────────────────────┼──────────┼──────────┤
  │ Engineering              │ $18,000  │ 40.0%    │
  │ Data/ML                  │ $7,500   │ 16.7%    │
  │ Sales/CRM                │ $5,400   │ 12.0%    │
  │ Marketing                │ $3,600   │ 8.0%     │
  │ Finance/ERP              │ $3,150   │ 7.0%     │
  │ HR                       │ $1,350   │ 3.0%     │
  │ Operations               │ $2,250   │ 5.0%     │
  │ Shared/Platform          │ $3,750   │ 8.3%     │
  │ ────────────────────── │ ────── │ ────── │
  │ TOTAL                  │ $45,000│ 100%   │
  └──────────────────────────┴──────────┴──────────┘

  Reporting: Weekly (department leads) + Monthly (leadership)
  Trending: Tracked monthly (YoY comparison)
  Budget adherence: 98% on-target (2 departments slightly over — flagged)

Resource Optimization

Rightsizing & Efficiency

RESOURCE OPTIMIZATION:
══════════════════════

RIGHTSIZING PROGRAM:
  Cadence: Monthly review (automated recommendation + manual approval)
  Tools: AWS Compute Optimizer + Azure Advisor
  Coverage: 100% of EC2 + Azure VMs (75 instances total)
  
  January 2025 recommendations:
    ┌──────────────────────────┬──────────┬──────────┐
    │ Recommendation           │ Count    │ Savings   │
    ├──────────────────────────┼──────────┼──────────┤
    │ Downsize (over-provisioned)│ 12      │ $1,800   │
    │ Uprsize (under-provisioned)│ 3       │ $0*      │
    │ Reserved instance (on-demand)│ 18    │ $4,200   │
    │ Spot instance (dev/test) │ 8        │ $950     │
    │ Idle (terminate)         │ 5        │ $650     │
    │ Optimal (no change)      │ 29       │ —        │
    │ ────────────────────── │ ────── │ ─────── │
    │ TOTAL                  │ 75     │ $7,600   │
    └──────────────────────────┴──────────┴──────────┘
  
  *Upsize required for performance (cost-neutral or slight increase)
  
  Implementation:
    Rightsized: 10/12 (2 deferred — maintenance window)
    Reserved: 15/18 (3 deferred — workload uncertainty)
    Spot: 6/8 (2 deferred — workload stability concern)
    Terminated: 5/5 (all confirmed idle — terminated)
  
  Savings realized: $6,200/month (January)
  Savings committed: $4,800/month (reserved instances — 1-year term)

RESERVED INSTANCE MANAGEMENT:
  AWS:
    RI coverage: 68% (of EC2 + RDS spend)
    RI terms: 1-year (55% of RI), 3-year (15% of RI)
    RI utilization: 94% (target: >90%) ✓
    Savings vs. on-demand: 35-40% (standard), 45-55% (heavy)
    Expiring RIs (next 90 days): 5 (renewal planned)
    RI optimization: Monthly exchange (instance type, tenancy)
  
  Azure:
    Savings Plans coverage: 62% (of compute spend)
    Savings Plan terms: 1-year (70%), 3-year (30%)
    Utilization: 91% (target: >90%) ✓
    Savings vs. on-demand: 25-30% (compute), 15-20% (all compute)
    Expiring plans (next 90 days): 3 (renewal planned)

SPOT INSTANCE UTILIZATION:
  Workloads on spot:
    Development environment: 30% of dev instances
    Testing/QA: 40% of test instances
    CI/CD runners: 60% of runners
    Batch processing: 80% of batch jobs
    Data analysis: 50% of analysis instances
  
  Spot interruption rate: 3.2% (AWS), 2.8% (Azure)
  Interruption handling:
    Graceful shutdown: 60-second warning (checkpoint + save)
    Auto-recovery: Launch replacement (same AZ or different AZ)
    Data persistence: EBS/Snapshot (state saved before shutdown)
    Impact: Minimal (stateless workloads + checkpoint)
  
  Savings: 60-70% vs. on-demand (spot pricing)
  Monthly savings: ~$2,100 (spot optimization)

STORAGE OPTIMIZATION:
  AWS S3:
    Total storage: 18 TB
    ┌──────────────────────────┬──────────┬──────────┐
    │ Storage Tier             │ Volume   │ Cost/Mo  │
    ├──────────────────────────┼──────────┼──────────┤
    │ S3 Standard              │ 8 TB     │ $232     │
    │ S3 Intelligent-Tiering   │ 5 TB     │ $165     │
    │ S3 Standard-IA           │ 3 TB     │ $60      │
    │ S3 Glacier               │ 1.5 TB   │ $15      │
    │ S3 Glacier Deep Archive  │ 0.5 TB   │ $3       │
    │ ────────────────────── │ ────── │ ────── │
    │ TOTAL                  │ 18 TB  │ $475     │
    └──────────────────────────┴──────────┴──────────┘
  
    Lifecycle policies:
      30 days: Standard → Intelligent-Tiering
      90 days: → Standard-IA
      180 days: → Glacier
      365 days: → Glacier Deep Archive
      730 days: → Delete (if no compliance requirement)
  
    Savings: $1,200/month (vs. all-standard tier)
    Compliance: 15 TB (retention override — legal hold)
  
  Azure Blob:
    Total storage: 8 TB
    Tiering: Hot (4 TB), Cool (3 TB), Archive (1 TB)
    Lifecycle: Auto-transition (30/90/180 day policy)
    Savings: $450/month (vs. all-hot tier)

IDLE RESOURCE DETECTION:
  Weekly scan (automated):
    Unattached EBS volumes: 3 (250 GB — $25/month)
    Unassociated Elastic IPs: 2 ($8/month)
    Idle RDS instances: 1 (dev DB — $120/month)
    Idle load balancers: 1 ($16/month)
    Stopped EC2 instances: 2 ($0 — no charge when stopped)
    Unused NAT Gateways: 1 ($32/month)
  
  Total idle cost: $201/month
  Action: Terminate/archive (weekly cleanup — auto-approved)
  Prevention: Auto-shutdown (off-hours, dev/test)

WASTE REDUCTION SUMMARY:
  Monthly waste identified: ~$3,500
  Monthly waste eliminated: ~$2,800
  Remaining waste: ~$700 (under review)
  Waste rate: 7.8% of total spend (target: <5%)
  Trend: Improving (March: 12% → Dec: 8% → Jan: 7.8%)

Cloud Security Posture

CSPM & Governance

CLOUD SECURITY POSTURE MANAGEMENT (CSPM):
═══════════════════════════════════════════

CSPM TOOLS:
  AWS: AWS Security Hub + AWS Config + AWS Trusted Advisor
  Azure: Microsoft Defender for Cloud
  GCP: Security Command Center
  Cross-cloud: Wiz (consolidated view)

SECURITY POSTURE SCORE:
  AWS: 92/100 (target: >90%) ✓
  Azure: 89/100 (target: >90%) — 1 point short
  GCP: 95/100 (target: >90%) ✓
  Overall: 91/100 (improving trend)

COMPLIANCE FRAMEWORKS (Cloud):
  ┌──────────────────────────┬──────────┬──────────┬──────────┐
  │ Framework                │ AWS      │ Azure    │ GCP      │
  ├──────────────────────────┼──────────┼──────────┼──────────┤
  │ CIS Foundations          │ 95%      │ 92%      │ 97%      │
  │ SOC 2                    │ 100%     │ 98%      │ 100%     │
  │ ISO 27001               │ 98%      │ 95%      │ 99%      │
  │ NIST 800-53             │ 92%      │ 90%      │ 95%      │
  │ PCI DSS (cloud scope)   │ 100%     │ N/A      │ N/A      │
  │ ────────────────────── │ ────── │ ────── │ ────── │
  │ Average                │ 97%    │ 92%    │ 98%    │
  └──────────────────────────┴──────────┴──────────┴──────────┘

  Gap remediation:
    Azure CIS gap: 2 findings (network config — remediation in progress)
    Azure ISO gap: 3 findings (logging config — remediation in progress)
    NIST gap: 5 findings (encryption, access — planned for February)
  
  Target: 95%+ across all frameworks (all providers)

CLOUD CONFIGURATION STANDARDS:
  Automated enforcement (AWS Config + Azure Policy):
    1. Storage encryption: 100% (AES-256, KMS managed)
    2. Network encryption: 100% (TLS 1.2+, no unencrypted endpoints)
    3. Public access: 0 S3 buckets publicly accessible (enforced)
    4. Security groups: No 0.0.0.0/28 on sensitive ports (22, 3389)
    5. MFA: Required for all root/admin accounts (enforced)
    6. Logging: Enabled for all services (CloudTrail, Azure Activity Log)
    7. Tagging: Mandatory tags (department, environment, owner)
    8. Backup: Enabled for all production databases (automated)
    9. VPC flow logs: Enabled (all VPCs)
    10. IAM: No inline policies (managed policies only)
  
  Compliance rate: 96% (automated checks)
  Non-compliant: 12 resources (remediation in progress)
  Auto-remediation: 8 of 10 rules (auto-correct)

CLOUD GOVERNANCE:
  Guardrails (prevention):
    - Max instance type: m5.2xlarge (larger requires approval)
    - Max monthly spend per project: $5K (budget alert at $4K)
    - Region restriction: us-east-1, us-west-2, eastus, westeurope
    - No production deletion: Delete protection (Terraform + cloud)
    - No public IP: Private endpoints (NAT, VPC endpoints)
    - No root usage: Root account disabled (IAM only)
  
  Monitoring (detection):
    - Cost anomaly: AWS Cost Anomaly + Azure Cost Alerts
    - Security finding: Security Hub + Defender (real-time)
    - Configuration drift: AWS Config + Azure Policy (hourly)
    - Usage spike: CloudWatch + Azure Monitor (15-minute)
  
  Review cadence:
    Weekly: Cost + security (automated report)
    Monthly: Governance review (IT + Finance + Security)
    Quarterly: Cloud strategy review (architecture, multi-cloud)
    Annually: Cloud provider evaluation (negotiation, migration)

MULTI-CLOUD STRATEGY:
  Workload distribution:
    AWS (65%): Core applications, databases, storage, CDN
    Azure (30%): Microsoft ecosystem (365, AD, SQL Server), HR/Finance
    GCP (5%): ML/AI workloads (specific models — TensorFlow)
  
  Multi-cloud benefits:
    - Best-of-breed (per workload)
    - Vendor diversification (risk reduction)
    - Cost optimization (price comparison)
    - Compliance (data residency, regulatory)
  
  Multi-cloud challenges:
    - Complexity (multiple consoles, APIs)
    - Skill requirements (AWS + Azure + GCP)
    - Cost visibility (consolidated billing)
    - Security consistency (unified policy)
  
  Mitigation:
    - Terraform (unified IaC across providers)
    - Wiz (unified security view)
    - CloudHealth (unified cost view)
    - Training (multi-cloud certification)

Output

Cloud Optimization Dashboard

CLOUD OPTIMIZATION DASHBOARD — Jan 2025
════════════════════════════════════

Cost Overview:
  Monthly spend: $45,000
  Annual run rate: ~$540K
  Cost trend: Stabilized (<5% MoM growth)
  Budget adherence: 98% on-target
  
  AWS: $29,250 (65%)
  Azure: $13,500 (30%)
  GCP: $2,250 (5%)

Optimization:
  Monthly savings: $15,800 (35% of gross)
  RI/SP coverage: 65% (AWS), 62% (Azure)
  Spot utilization: 30-60% (dev/test/CI/CD)
  Rightsizing: $3,200/month savings
  Storage tiering: $1,650/month savings
  Waste rate: 7.8% (target: <5%)

Efficiency:
  Instance utilization: 55-65% (CPU), 60-70% (memory)
  Storage optimization: 3-tier lifecycle (auto)
  Idle resources: $201/month (weekly cleanup)
  Auto-shutdown: Dev/test (off-hours)

Security:
  CSPM score: 91/100 (AWS: 92, Azure: 89, GCP: 95)
  Compliance: 97% (CIS), 99% (SOC 2), 97% (ISO 27001)
  Config standards: 96% compliant
  Non-compliant resources: 12 (remediation in progress)

Governance:
  Guardrails: 10 rules (auto-enforce)
  Budget alerts: 80%, 90%, 100%
  Tagging: 95% coverage (target: 100%)
  Region: 4 approved regions (all workloads)
  Root access: Disabled (IAM only)

Actions:
  1. Improve tagging (95% → 100% coverage)
  2. Reduce waste rate (7.8% → <5%)
  3. Azure CSPM improvement (89 → 90+)
  4. RI renewal (5 expiring — next 90 days)
  5. Cloud strategy review (quarterly — Q1)

Integration Points

Edge Cases