---
name: infrastructure-provisioning
description: Automate infrastructure provisioning using Infrastructure as Code, cloud resource management, environment deployment, and configuration management. Use when provisioning cloud resources, managing IaC templates, automating environment setup, or standardizing infrastructure deployment. Triggers on phrases like "infrastructure as code", "IaC", "cloud provisioning", "Terraform", "environment deployment", "configuration management", "auto-scaling", "resource tagging", "infrastructure template".
---

# Infrastructure Provisioning & IaC

Automate infrastructure provisioning and management using Infrastructure as Code principles for consistency and scalability.

## Workflow

### 1. Infrastructure Architecture & Design

1. **Infrastructure blueprint development**:
   - Environment architecture (dev, staging, production, disaster recovery)
   - Network topology design (VPC/VNet, subnets, security groups, routing)
   - Compute resource planning (VM sizes, container orchestration, serverless)
   - Storage architecture (block, object, file, database)
   - High availability and redundancy design

2. **Infrastructure standards definition**:
   - Naming conventions for all resources
   - Tagging strategy (owner, environment, cost center, application)
   - Security baseline (encryption, access controls, monitoring)
   - Compliance requirements mapping (SOC 2, HIPAA, PCI-DSS)
   - Cost optimization standards (right-sizing, reserved instances, spot)

3. **Platform and tool selection**:
   - Cloud provider strategy (single, multi-cloud, hybrid)
   - IaC tool selection (Terraform, CloudFormation, Pulumi, Bicep)
   - Configuration management (Ansible, Chef, Puppet)
   - Container orchestration (Kubernetes, ECS, AKS)
   - Infrastructure state management

### 2. IaC Development & Management

1. **Module and template development**:
   - Reusable module/library creation
   - Parameterization and variable management
   - Input/output definition and documentation
   - Version control and branching strategy
   - Module testing and validation

2. **State management and collaboration**:
   - Remote state storage and locking
   - State file security and access control
   - State migration and import procedures
   - Workspace/environment isolation
   - State backup and recovery

3. **Code review and governance**:
   - IaC code review process (pull request workflow)
   - Security scanning (Checkov, tfsec, Terrascan)
   - Cost estimation before deployment
   - Policy compliance validation (Sentinel, OPA)
   - Documentation and runbook maintenance

### 3. Environment Deployment & Orchestration

1. **CI/CD for infrastructure**:
   - Pipeline design for IaC deployment
   - Automated testing (syntax, security, policy compliance)
   - Plan/preview before apply
   - Approval gates for production changes
   - Rollback procedures and automation

2. **Deployment execution**:
   - Environment provisioning sequence and dependency management
   - Blue/green and canary deployment for infrastructure
   - Zero-downtime deployment techniques
   - Deployment validation and health checks
   - Post-deployment verification

3. **Configuration drift management**:
   - Drift detection and alerting
   - Drift remediation (reconcile vs ignore)
   - Manual change documentation and import
   - Compliance reporting on drift
   - Prevention through access controls

### 4. Cloud Resource Optimization

1. **Cost optimization**:
   - Right-sizing analysis and implementation
   - Reserved instance and savings plan optimization
   - Spot instance utilization for fault-tolerant workloads
   - Auto-scaling configuration and tuning
   - Resource utilization monitoring and reporting

2. **Performance optimization**:
   - Compute performance tuning
   - Database performance optimization
   - Network latency reduction
   - CDN and caching strategy
   - Storage tier optimization

3. **Resource lifecycle management**:
   - Automated cleanup of unused resources
   - Environment teardown for non-production
   - Resource expiration and scheduling
   - Decommissioning procedures
   - Cost allocation and showback/chargeback

### 5. Security & Compliance Automation

1. **Security by default**:
   - Encryption at rest and in transit (default enabled)
   - Network segmentation and zero-trust principles
   - Identity and access management automation
   - Secret management integration
   - Security group and firewall rule management

2. **Compliance automation**:
   - Compliance policy as code
   - Continuous compliance monitoring
   - Audit trail and logging configuration
   - Remediation automation for compliance violations
   - Compliance reporting and evidence collection

3. **Disaster recovery automation**:
   - Automated backup configuration
   - Cross-region replication setup
   - DR failover automation
   - Recovery testing automation
   - RTO/RPO validation

## Templates & Frameworks

### Infrastructure Module Template

```
TERRAFORM MODULE — [Service Name]
===================================

MODULE: modules/[service-name]
  Purpose: [description of what this module provisions]
  Owner: [team/person]
  Last updated: [date]

INPUT VARIABLES:
  environment     (string) — dev, staging, prod
  region          (string) — primary AWS/Azure/GCP region
  instance_type   (string) — compute instance size
  min_instances   (number) — minimum running instances
  max_instances   (number) — maximum auto-scaling instances
  enable_monitoring (bool) — enable CloudWatch/Datadog monitoring
  tags            (map) — resource tagging

OUTPUT VALUES:
  endpoint       — service endpoint URL
  security_group_id — SG ID for resource access
  alb_dns_name   — load balancer DNS name
  rds_endpoint   — database endpoint

PROVISIONED RESOURCES:
  — Auto Scaling Group (min: ${min}, max: ${max})
  — Application Load Balancer (HTTPS, path-based routing)
  — RDS Instance (Multi-AZ, encrypted, automated backups)
  — Security Groups (restricted ingress/egress)
  — CloudWatch Alarms (CPU, memory, error rate)
  — S3 Bucket (logs, encrypted, versioned)
  — IAM Roles (least privilege, assumed by instances)

USAGE EXAMPLE:
  module "web-app" {
    source = "./modules/web-app"
    environment = "production"
    region = "us-east-1"
    instance_type = "t3.medium"
    min_instances = 2
    max_instances = 10
    enable_monitoring = true
    tags = {
      owner = "platform-team"
      cost_center = "engineering"
      application = "web-frontend"
    }
  }

DEPLOYMENT NOTES:
  — Estimated monthly cost: ~$450 (dev), ~$2,100 (prod)
  — RTO: 15 minutes (auto-scaling + ALB)
  — RPO: 5 minutes (RDS automated backups)
  — Requires: VPC module, DNS module
```

### Infrastructure Deployment Pipeline

```
INFRASTRUCTURE CI/CD PIPELINE
===============================

STAGE 1: VALIDATION
  — Terraform fmt (code formatting)
  — Terraform validate (syntax check)
  — Checkov/tfsec (security scanning)
  — Cost estimation (infracost)
  — Policy compliance check (Sentinel/OPA)

STAGE 2: PLAN
  — Terraform plan (preview changes)
  — Diff review and annotation
  — Cost impact assessment
  — Stakeholder notification (if production)

STAGE 3: APPROVAL
  — Development: auto-approve
  — Staging: team lead approval
  — Production: 2-person approval + change manager

STAGE 4: APPLY
  — Terraform apply (execute changes)
  — State update and backup
  — Deployment log capture

STAGE 5: VALIDATION
  — Health check execution
  — Smoke test execution
  — Monitoring alert verification
  — Performance baseline comparison
  — Compliance scan post-deployment

STAGE 6: DOCUMENTATION
  — Change log update
  — Runbook update (if needed)
  — Stakeholder notification
  — Rollback procedure verification

ROLLBACK TRIGGER:
  — Health check failure > 3 consecutive
  — Error rate > 5% for 5 minutes
  — Latency p95 > 2x baseline
  — Manual trigger by on-call engineer
```

## Integration Points

- Cloud platforms (AWS, Azure, GCP): Infrastructure resources
- IaC tools (Terraform, CloudFormation, Pulumi): Infrastructure definition
- CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, CircleCI): Pipeline execution
- Security scanning (Checkov, tfsec, Terrascan): Code security validation
- Policy engines (Sentinel, OPA): Governance enforcement
- State management (Terraform Cloud, S3+DynamoDB): State storage and locking
- Cost management (CloudHealth, Cloudability, native tools): Cost tracking
- Configuration management (Ansible, Chef): Post-provisioning configuration
- Monitoring (Datadog, Prometheus, CloudWatch): Resource monitoring

## Edge Cases

- **Multi-cloud infrastructure**: Consistent IaC abstraction layer; provider-specific module management; cross-cloud networking; unified monitoring; cost comparison
- **Legacy on-premises integration**: Hybrid connectivity setup; configuration management for physical servers; gradual migration planning; coexistence architecture
- **Frequent infrastructure changes**: Automated testing suite; rapid deployment pipeline; feature flag infrastructure; canary deployment for infrastructure
- **Strict compliance environments**: Enhanced approval workflow; change window enforcement; audit trail completeness; evidence automation; compliance pre-check
- **State file corruption or loss**: State backup and recovery procedure; state import for manual changes; state migration planning; disaster recovery for state

## Output

### Infrastructure Provisioning Dashboard

```
INFRASTRUCTURE STATUS — April 2025
===================================

ENVIRONMENT INVENTORY:
  Development: 12 environments (47 resources)
  Staging: 3 environments (134 resources)
  Production: 2 environments (387 resources)
  DR: 1 environment (387 resources — synchronized)

PROVISIONING METRICS:
  IaC coverage: 94% (target: >95% ⚠)
  Manual changes detected: 7 (remediation in progress)
  Configuration drift: 2.3% (target: <1% ⚠)
  Deployments this month: 47 (34 successful, 1 rollback, 12 pending)

CLOUD RESOURCE COST:
  Monthly spend: $127,400 (↓ 4.2% from last month ✓)
  Reserved instance coverage: 78%
  Spot instance utilization: 23% (batch workloads)
  Right-sizing savings identified: $8,400/month
  Unused resources flagged: 12 ($2,300/month waste)

SECURITY POSTURE:
  Encryption at rest: 99.2% resources ✓
  Security group compliance: 97.8% ✓
  IAM policy compliance: 95.4% ✓
  Security scan issues (critical): 0 ✓
  Open vulnerabilities (medium): 4 (patching scheduled)

COMPLIANCE STATUS:
  SOC 2 compliance: 98.7% ✓
  HIPAA compliance: 99.1% ✓
  PCI-DSS scope: 3 systems (all compliant ✓)
  Audit evidence automated: 94%
  Last compliance scan: April 14 (passed)

DEPLOYMENT PERFORMANCE:
  Avg deployment time: 8.4 minutes
  Deployment success rate: 97.9% ✓
  Mean time to rollback: 3.2 minutes
  Zero-downtime deployments: 92%

AUTOMATION METRICS:
  Auto-scaling events: 234 this month
  Automated remediation: 67 actions
  Scheduled maintenance: 4 completed (0 issues)
  Backup success rate: 99.7% ✓
```

## Trigger Phrases

"infrastructure as code", "IaC", "cloud provisioning", "Terraform", "environment deployment", "configuration management", "auto-scaling", "resource tagging", "infrastructure template", "CI/CD infrastructure", "drift detection", "right-sizing", "multi-cloud", "infrastructure pipeline"
