IT AI Skill
Infrastructure As Code
Design, implement, and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, Pulumi, or similar tools. Manage state, modules, workspaces, drift detection, and IaC best practices. Use when provisioning infrastructure via code, managing Terra...
Infrastructure as Code (IaC)
Design, implement, and maintain Infrastructure as Code using Terraform, CloudFormation, and similar tools.
Workflow
1. IaC Architecture & Organization
TERRAFORM PROJECT STRUCTURE
═══════════════════════════════════════
infrastructure/
├── modules/
│ ├── network/
│ │ ├── main.tf (VPC, subnets, route tables)
│ │ ├── variables.tf (input variables)
│ │ ├── outputs.tf (output values)
│ │ └── versions.tf (provider constraints)
│ ├── compute/
│ ├── database/
│ ├── storage/
│ └── security/
│
├── environments/
│ ├── development/
│ │ ├── main.tf (module calls)
│ │ ├── variables.tf
│ │ ├── terraform.tfvars (env-specific values)
│ │ └── backend.tf (state config)
│ ├── staging/
│ └── production/
│
├── scripts/
│ ├── validate.sh
│ └── policy-check.sh
│
├── policies/
│ ├── sentinel/ (HashiCorp Sentinel policies)
│ └── opa/ (Open Policy Agent)
│
└── README.md
STATE MANAGEMENT:
═══════════════════════════════════════
Backend: S3 + DynamoDB (AWS) / GCS + lock (GCP)
═══════════════════════════════════════
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/network/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-lock"
encrypt = true
acl = "bucket-owner-full-control"
}
}
State isolation:
→ Per-environment state files
→ Per-module state (optional, for large projects)
→ Workspace separation (dev/staging/prod)
2. Module Design
MODULE DESIGN BEST PRACTICES
═══════════════════════════════════════
Module: VPC Network
═══════════════════════════════════════
variables.tf:
═══════════════════════════════════════
variable "vpc_cidr" {
type = string
description = "CIDR block for VPC"
validation {
condition = can(cidrhost(var.vpc_cidr, 0))
error_message = "Must be valid CIDR"
}
}
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod"
}
}
variable "az_count" {
type = number
default = 3
}
main.tf:
═══════════════════════════════════════
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(var.common_tags, {
Name = "${var.environment}-vpc"
Environment = var.environment
})
}
resource "aws_subnet" "public" {
count = var.az_count
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = { Name = "${var.environment}-public-${count.index + 1}" }
}
outputs.tf:
═══════════════════════════════════════
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "Public subnet IDs"
value = aws_subnet.public[*].id
}
MODULE DOCUMENTATION:
═══════════════════════════════════════
→ terradoc / terraform-docs (auto-generate)
→ README.md with:
· Usage example
· Inputs (description, type, default, required)
· Outputs (description, sensitive)
· Dependencies
· Assumptions
3. IaC CI/CD Pipeline
IaC CI/CD PIPELINE
═══════════════════════════════════════
GitHub Actions Workflow:
═══════════════════════════════════════
name: Terraform CI/CD
on:
pull_request:
paths: ["infrastructure/**"]
push:
branches: [main]
paths: ["infrastructure/**"]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- checkout
- setup-terraform
- run: terraform fmt -check -recursive # Format check
- run: terraform init # Initialize
- run: terraform validate # Syntax check
- run: terraform validate -no-color # Validation
security:
needs: validate
runs-on: ubuntu-latest
steps:
- run: tfsec . # Security scan
- run: checkov -d . # Compliance check
- run: terracognita # Policy check
plan:
needs: security
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- run: terraform plan -out=tfplan
- run: terraform show -json tfplan | terraform-show
# Post plan output to PR comment
apply:
needs: plan
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
environment: production # Requires approval
steps:
- run: terraform plan -out=tfplan
- run: terraform apply tfplan
GUARDRAILS:
═══════════════════════════════════════
Pre-apply checks:
→ terraform fmt: Code formatting
→ terraform validate: Syntax validity
→ tfsec: Security vulnerabilities
→ checkov: Compliance policies
→ infracost: Cost estimation (budget alert)
→ Sentinel/OPA: Custom policies
Policies (Sentinel):
→ No public S3 buckets
→ No default security groups
→ No r5.4xlarge instances without approval
→ Tags required on all resources
→ Encryption enabled on all storage
4. State Management & Drift Detection
STATE MANAGEMENT
═══════════════════════════════════════
State Operations:
═══════════════════════════════════════
Import existing resource:
terraform import aws_instance.web i-0abc123def456
Move resource (rename):
terraform state mv aws_instance.old aws_instance.new
Remove from state (orphan):
terraform state rm aws_instance.deprecated
List resources:
terraform state list
Show resource details:
terraform state show aws_instance.web
DRIFT DETECTION:
═══════════════════════════════════════
Scheduled drift detection (nightly):
terraform plan -detailed-exitcode
Exit codes:
0 = No changes
1 = Error
2 = Changes detected (drift)
Notification on drift:
→ Slack alert to #infra-drift
→ Include plan output (what changed)
→ Auto-ticket creation (Jira)
Drift remediation:
→ Option 1: terraform apply (overwrite manual changes)
→ Option 2: Update code to match (adopt manual changes)
→ Option 3: Import new state (for new resources)
5. IaC Best Practices
IaC BEST PRACTICES
═══════════════════════════════════════
DO:
═══════════════════════════════════════
→ Use modules for reusability
→ Version all providers and modules
→ Store state remotely with locking
→ Encrypt state files
→ Use variables (no hardcoded values)
→ Use workspaces or separate directories per environment
→ Document modules (inputs, outputs, usage)
→ Use null resources for triggers
→ Implement guardrails (policies)
→ Review terraform plan before apply
→ Use refresh-only plan to detect drift
→ Pin provider versions (compatibility)
DON'T:
═══════════════════════════════════════
→ Hardcode secrets (use var files or vault)
→ Use count/index for complex logic (use for_each)
→ Commit state files to version control
→ Use root module for everything
→ Skip terraform plan (always review first)
→ Share state files between environments
→ Use * wildcards in provider versions
→ Ignore dependency warnings
→ Apply without testing in staging first
Edge Cases
- Large state files: Split into multiple state files
- State corruption: Backup and restore procedures
- Provider migration: State migration between providers
- Multi-cloud: Cross-cloud resource management
- Import existing: Migrating manual infrastructure to IaC
Integration Points
- IaC tools: Terraform, CloudFormation, Pulumi
- CI/CD: GitHub Actions, GitLab CI, Jenkins
- Policy: Sentinel, OPA, Conftest
- Security: tfsec, checkov, terrascan
- Cost: infracost, cloud-nativetools
- State: S3, GCS, Terraform Cloud, Atlas
Output
IaC Status
INFRASTRUCTURE AS CODE STATUS
═══════════════════════════════════════
Modules: 15 (network, compute, database, etc.)
Environments: 3 (dev, staging, prod)
Drift: 0 resources (all in sync)
Security: 0 critical, 2 warnings (remediating)
State: Remote (S3 + DynamoDB lock)
CI/CD: GitHub Actions (automated plan/apply)
Coverage: 95% of infrastructure managed by IaC