IT AI Skill

Capacity Planning Infrastructure

Plan and manage IT infrastructure capacity including compute, storage, network, and cloud resources. Use when forecasting infrastructure needs, right-sizing resources, planning data center expansion, optimizing cloud capacity, conducting capacity reviews, o...

IT Infrastructure Capacity Planning

Forecast, plan, and optimize IT infrastructure capacity to support current and future business needs.

Workflow

  1. Establish capacity baselines: current utilization for compute, storage, network, and applications.
  2. Collect historical data: 12-month utilization trends, growth rates, seasonal patterns.
  3. Forecast demand: business-driven (user growth, transaction volume) and technology-driven (new initiatives).
  4. Analyze capacity gaps: compare projected demand against current and planned capacity.
  5. Develop capacity plan: procurement timeline, budget, implementation schedule, risk mitigation.
  6. Implement right-sizing: optimize current resources before expanding (eliminate waste first).
  7. Monitor capacity thresholds: automated alerts at 70%, 80%, 90% utilization.
  8. Conduct quarterly capacity reviews: update forecasts, validate assumptions, adjust plans.
  9. Report capacity posture: executive dashboard, procurement recommendations, risk assessment.
  10. Execute capacity improvements: procurement, deployment, configuration, validation.

Compute Capacity Planning

COMPUTE CAPACITY FRAMEWORK
============================

Current State Assessment:

  On-Premises Servers:
    Total physical servers:       [X]
    Average utilization:           CPU [Y]%, Memory [Z]%
    Server density:               [X] VMs per physical host (average)
    Host capacity headroom:       [X] additional VMs per host
    Servers at > 80% utilization:  [X] (need attention)
    Idle/underutilized servers:    [X] (candidates for consolidation)
    End-of-life within 12 months:  [X] (need replacement budget)

  Virtual Machines:
    Total VMs:                     [X] production, [Y] non-production
    Average VM specs:              [X] vCPU, [Y] GB RAM
    Over-provisioned VMs:          [X] (CPU < 10% for 30 days)
    Under-provisioned VMs:         [Y] (CPU > 80% for 30 days)
    Snapshot count:                [X] (clean up old snapshots)
    Unused VMs:                    [X] (powered off > 30 days)

  Cloud Compute:
    Total instances:               [X] (AWS EC2 + Azure VMs + GCP GCE)
    Average utilization:           CPU [Y]%, Memory [Z]%
    Reserved instances:            [X]% of total (target: > 70% for stable workloads)
    Spot instances:                [X]% of total (for fault-tolerant workloads)
    Idle instances:                [X] (no network traffic for 7+ days)
    Zombie instances:              [X] (running but no attached EBS/network)

Compute Growth Projection:

  Demand drivers:
    User growth:                   [X]% per quarter → [Y] additional VMs/instances
    Application growth:            [X] new services planned → [Y] additional compute
    Batch/ETL growth:              [X]% data growth → [Y] additional compute
    Peak factor:                   [X]x average (handle 95th percentile, not average)

  12-month projection:

    Quarter    Current VMs    Projected    Growth    Headroom    Action Required
    ────────   ───────────    ─────────   ────────  ──────────  ───────────────────
    Q1 (now)   500            500         —         15%         None
    Q2         500            550         +10%      10%         Monitor
    Q3         550            610         +11%      4%          ⚠️ Plan expansion
    Q4         610            670         +10%      -2%         🔴 Execute expansion

    Action: Procure 3 new hosts (Q2 delivery) or increase cloud budget by 20%

Right-Sizing Recommendations:

  Over-provisioned (downsize):
    VM-001: 8 vCPU, 32 GB RAM → 4 vCPU, 16 GB RAM (CPU avg: 8%, Mem avg: 15%)
    VM-002: 4 vCPU, 16 GB RAM → 2 vCPU, 8 GB RAM (CPU avg: 5%, Mem avg: 20%)
    VM-003: 4 vCPU, 8 GB RAM → 2 vCPU, 4 GB RAM (CPU avg: 12%, Mem avg: 30%)
    Estimated savings: 12 vCPU, 32 GB RAM → redeploy to other workloads

  Under-provisioned (upsize):
    VM-010: 2 vCPU, 4 GB RAM → 4 vCPU, 8 GB RAM (CPU avg: 85%, Mem avg: 90%)
    VM-011: 2 vCPU, 4 GB RAM → 4 vCPU, 8 GB RAM (CPU avg: 78%, Mem avg: 85%)
    Action: immediate resize to prevent performance degradation

  Consolidation opportunities:
    10 small VMs (1 vCPU each, < 10% CPU) → 1 medium VM (4 vCPU) with containers
    Savings: 9 physical/VM hosts reclaimed; reduced management overhead

Storage Capacity Planning

STORAGE CAPACITY FRAMEWORK
============================

Current Storage Inventory:

  On-Premises Storage:
    SAN arrays:                    [X] arrays, [Y] TB total raw, [Z] TB usable
    NAS/file servers:              [X] TB total
    Tape library:                  [X] TB (for backup archive)
    Average utilization:           [X]%
    Growth rate:                   [Y] TB per month

  Cloud Storage:
    Block storage (EBS/Managed):   [X] TB, $[Y]/month
    Object storage (S3/Blob):      [X] TB, $[Y]/month
    File storage (EFS/DFS):        [X] TB, $[Y]/month
    Database storage:              [X] TB
    Backup storage:                [X] TB
    Archive storage (Glacier):     [X] TB
    Total cloud storage:           [X] TB, $[Y]/month

  Storage by tier:
    Hot (frequent access):         [X] TB — $[Y]/TB/month
    Warm (occasional access):      [X] TB — $[Y]/TB/month
    Cool (rare access):            [X] TB — $[Y]/TB/month
    Archive (annual access):       [X] TB — $[Y]/TB/month

Storage Growth Analysis:

  Historical growth (last 12 months):
    Month    Total TB    Growth TB    Growth %    Cost/Month    Trend
    ──────   ──────────  ──────────   ─────────   ────────────  ─────
    Jan      500         +8           +1.6%       $5,000        ↑
    Feb      508         +10          +2.0%       $5,080        ↑
    Mar      518         +12          +2.3%       $5,180        ↑
    Apr      530         +15          +2.9%       $5,300        ↑↑
    ...
    Dec      650         +20          +3.2%       $6,500        ↑↑

  Growth rate: accelerating (+0.5% per month)
  Projected 12-month: 650 TB → 950 TB (+46% growth)
  Alert: 80% threshold reached in [X] months

Storage Optimization:

  Data lifecycle policies:
    0–90 days:   Hot storage (SSD/NVMe) — active data
    90–180 days: Warm storage (S3 Standard-IA) — reduced access
    180–365 days: Cool storage (S3 Glacier Flexible) — archive
    365+ days:   Deep archive (S3 Glacier Deep Archive) — compliance

  Deduplication and compression:
    Backups: 3:1 to 10:1 dedup ratio typical
    VM images: 2:1 to 5:1 dedup ratio
    Logs: 5:1 to 10:1 compression ratio
    Estimated savings: 40–60% with proper lifecycle management

  Wasted storage:
    Unattached volumes:           [X] TB ($[Y]/month wasted)
    Old snapshots:                [X] TB ($[Y]/month wasted)
    Empty buckets/containers:     [X] buckets
    Duplicate files:              [X] TB (identify with dedup tools)
    Oversized disks:              [X] disks > 80% free space
    Total reclaimable:            [X] TB ($[Y]/month savings)

  Storage alerts:
    70% utilization:  WARNING — begin planning expansion
    80% utilization:  PLANNING — submit procurement request
    90% utilization:  CRITICAL — immediate action required
    95% utilization:  EMERGENCY — emergency procurement; may impact operations

Network Capacity Planning

NETWORK CAPACITY FRAMEWORK
============================

WAN Capacity:

  Current WAN links:
    Primary ISP:         [X] Gbps, average [Y]%, peak [Z]%
    Secondary ISP:       [X] Gbps, average [Y]%, peak [Z]%
    Backup (4G/5G):     [X] Mbps (emergency only)
    Direct Connect:      [X] Gbps (cloud interconnect)

  Utilization trend:
    Month    Avg Util    Peak (95th)   Growth     Status
    ──────   ──────────  ───────────   ─────────  ────────
    Jan      45%         65%           baseline   🟢
    Apr      52%         72%           +2%/mo     🟡 Watch
    Jul      60%         80%           +2.7%/mo   🟡 Plan
    Oct      68%         88%           +2.7%/mo   🔴 Act

  Projected: 80% average utilization in [X] months
  Recommendation: upgrade to [Y] Gbps or add secondary [Z] Gbps link
  Budget: $[X]/month for upgraded link

LAN/Datacenter Network:

  Core switch capacity:
    Switching capacity:   [X] Tbps
    Port density:         [X] ports (1G/10G/25G/40G/100G)
    Utilization:          [X]% of switching capacity
    Available ports:      [X] (for new servers)

  Top-of-rack (ToR) switches:
    Switches:             [X] × 48-port 10G/25G switches
    Uplinks:              [X] × 40G/100G to core
    Uplink utilization:   [X]% (alert > 70%)
    Available ports:      [X] per switch average

  Network growth drivers:
    New servers:          [X] planned → [Y] additional switch ports needed
    Bandwidth growth:     [X]% per quarter (video, backups, replication)
    Cloud traffic:        [X] Gbps to cloud (growing [Y]% per quarter)

Wireless Capacity:

  Access points:          [X] APs covering [Y] sq ft
  Clients per AP:         Average [Z] (target: < 30 for performance)
  Channel utilization:    Average [X]% (target: < 70%)
  Growth:                 [X] new clients per quarter
  Upgrade plan:           Wi-Fi 6/6E/7 upgrade in [X] months

Cloud Capacity Planning

CLOUD CAPACITY AND COST FORECASTING
=====================================

Current Cloud Spend:

  Provider    Monthly     Annual    YoY Growth    % of Budget    Trend
  ─────────   ──────────  ────────  ────────────  ─────────────  ─────
  AWS         $45,000     $540,000  +25%          52%            ↑↑
  Azure       $25,000     $300,000  +15%          28%            ↑
  GCP         $10,000     $120,000  +40%          11%            ↑↑↑
  Other       $5,000      $60,000   +10%          9%             →
  ────────────────────────────────────────────────────────────────────────
  Total       $85,000     $1,020,000 +22%         100%           ↑

Cost by service category:

  Category            Monthly     % of Total    Growth Rate    Optimization
  ──────────────────  ──────────  ────────────  ────────────   ──────────────
  Compute (EC2/VMs)   $30,000     35%           +20%           Right-size, RI
  Database (RDS/etc)  $15,000     18%           +25%           Read replicas
  Storage (S3/Blob)   $10,000     12%           +30%           Lifecycle policies
  Network (data xfer) $8,000      9%            +35%           VPC endpoints
  Container (EKS/etc) $7,000      8%            +40%           Auto-scaling
  Managed Services    $8,000      9%            +15%           Evaluate build vs buy
  Other               $7,000      9%            +10%           Review quarterly

Capacity projection (12 months):

  Quarter    Compute    Storage    Network    Total      Headroom    Action
  ────────   ─────────  ─────────  ─────────  ─────────  ──────────  ────────
  Q1        $30K       $10K       $8K        $48K       44%         —
  Q2        $33K       $11K       $9K        $53K       38%         —
  Q3        $36K       $12K       $10K       $58K       32%         ⚠️
  Q4        $40K       $14K       $12K       $64K       24%         🔴

  Budget cap: $65K/month (current)
  Gap: Q4 projected at $64K — near budget cap
  Action: implement cost optimization initiatives by Q2
  Potential savings: $10K–$15K/month with RI + right-sizing + lifecycle

Cloud capacity optimization:

  Reserved Instances / Savings Plans:
    Current coverage:    [X]% of eligible spend
    Target coverage:     > 75% for stable workloads
    Savings potential:   $[X]/month (15–30% discount)
    Implementation:      commit to 1-year or 3-year terms

  Auto-scaling:
    Current:             [X] instances running 24/7
    With auto-scaling:   [Y] instances during off-hours
    Savings:             [X-Y] instances × $[Z]/month

  Right-sizing:
    Over-provisioned:    [X] instances (CPU < 20% for 30 days)
    Right-size candidates: [X] instances → smaller types
    Savings:             $[Y]/month

  Spot instances:
    Eligible workloads:  [X]% (batch, CI/CD, testing, stateless)
    Savings:             60–90% vs. on-demand
    Risk:                instance interruption (handle gracefully)

Integration Points

Edge Cases