IT AI Skill

Automation Orchestration

Design, build, and manage IT automation workflows and orchestration pipelines for infrastructure provisioning, incident response, configuration management, and operational tasks. Use when creating automation playbooks, building orchestration workflows, impl...

Automation & Orchestration

Design and implement automated workflows that reduce manual operations and enable self-healing infrastructure.

Workflow

1. Automation Strategy & Framework

  1. Automation opportunity identification:
  1. Automation framework design:
  1. Automation governance:

2. Infrastructure as Code (IaC) Automation

  1. IaC development and management:
  1. IaC CI/CD pipeline:
  1. Drift management:

3. Configuration Management Automation

  1. Configuration baseline enforcement:
  1. Patch and update automation:
  1. Application deployment automation:

4. Incident Response Automation

  1. Automated incident playbooks:
  1. Self-healing workflows:
  1. Security response automation (SOAR):

5. Operational Task Automation

  1. Provisioning and deprovisioning:
  1. Routine maintenance automation:
  1. Report and notification automation:

Templates & Frameworks

Automation Pipeline Structure

AUTOMATION PIPELINE — Infrastructure Change
============================================

Stage 1: Code Commit
  → Git push triggers CI pipeline
  → Linting and formatting validation
  → Unit tests execution

Stage 2: Plan & Validate
  → terraform plan (dry run)
  → Security policy scan (tfsec + OPA)
  → Cost impact estimation
  → Approval workflow (if production change)

Stage 3: Test Environment
  → Apply to test environment
  → Integration tests
  → Smoke tests against live system
  → Performance validation

Stage 4: Staging Environment
  → Apply to staging
  → Full regression test suite
  → Security scan of deployed infrastructure
  → Stakeholder approval

Stage 5: Production Deployment
  → Apply to production (maintenance window)
  → Health check validation
  → Monitoring alert review (30 min observation)
  → Success notification or auto-rollback

Stage 6: Post-Deployment
  → Update documentation
  → Verify drift detection baseline
  → Archive deployment artifacts
  → Generate change report

Self-Healing Workflow Examples

SELF-HEALING WORKFLOWS
=======================

Service Down → Auto-Restart:
  Trigger: Health check failure (3 consecutive failures)
  Action: Restart service via systemd/container runtime
  Validation: Health check passes within 60 seconds
  Escalation: If restart fails after 3 attempts → page on-call engineer

Disk Space Critical (>90%):
  Trigger: Disk usage alert
  Action: Rotate logs → Clear temp files → Archive old data
  Validation: Disk usage drops below 80%
  Escalation: If disk remains >90% after cleanup → page on-call

Memory Pressure (>85% for 5 min):
  Trigger: Memory utilization alert
  Action: Identify top memory consumers → Gracefully restart non-critical services
  Validation: Memory drops below 75%
  Escalation: If OOM killer triggered → page on-call immediately

Database Connection Pool Exhaustion:
  Trigger: Connection pool at 95% capacity
  Action: Kill idle connections (>30 min) → Increase pool size by 20%
  Validation: Pool utilization drops below 70%
  Escalation: If pool exhaustion persists → page DBA on-call

Integration Points

Edge Cases

Output

Automation Dashboard

AUTOMATION OPS — Real-Time
===========================

ACTIVE WORKFLOWS:
  Running: 23
  Completed (last 24h): 187
  Failed (last 24h): 3 (1.6% failure rate)
  Self-healing triggers (last 24h): 12

IaC STATUS:
  Drift detected: 2 resources (auto-remediation in progress)
  Pending changes: 4 (in approval)
  Last full sync: 2025-04-15 06:00 UTC

CONFIGURATION COMPLIANCE:
  Systems compliant: 94% (1,028/1,093)
  Drift events (24h): 7
  Auto-remediated: 5
  Manual review needed: 2

AUTOMATION COVERAGE:
  Processes automated: 47/72 (65%)
  Estimated hours saved/week: 142
  MTTR reduction: 67%

FAILED WORKFLOWS REQUIRING ATTENTION:
  🔴 Production DB migration — failed at step 3 (rollback executed)
  ⚠  Certificate deployment to staging — retry scheduled
  ⚠  Log archival for archive server — disk full, needs manual cleanup

Trigger Phrases

"automation", "orchestration", "infrastructure as code", "IaC", "Terraform", "Ansible", "automated remediation", "self-healing", "configuration management", "playbook", "drift detection", "auto-provisioning", "automated deployment", "SOAR", "incident automation", "patch automation", "self-service provisioning", "workflow automation"