---
name: cloud-governance-management
description: Establish and enforce cloud governance policies including resource tagging, cost allocation, security baselines, compliance automation, and operational standards across AWS, Azure, and GCP. Use when defining cloud governance frameworks, implementing guardrails, establishing resource tagging standards, configuring policy-as-code, managing cloud compliance, conducting cloud governance audits, or building cloud center of excellence. Triggers on phrases like "cloud governance", "cloud guardrails", "resource tagging policy", "cloud compliance", "policy as code", "cloud center of excellence", "cloud standards", "cloud operating model", "multi-account strategy", "cloud landing zone", "cloud policy enforcement".
---

# Cloud Governance & Management

Establish and enforce cloud governance frameworks that balance agility with control, enabling self-service while maintaining security, compliance, and cost management across multi-cloud environments.

## Workflow

1. Define cloud governance model: identify governance body (cloud center of excellence, steering committee), decision rights, policy ownership, and escalation paths.
2. Establish resource naming and tagging standards: mandatory tags (cost center, owner, environment, project), naming conventions, enforcement mechanisms.
3. Implement organizational structure: AWS Organizations / Azure Management Groups / GCP Folder hierarchy; multi-account/subscription strategy per workload and team.
4. Deploy cloud landing zone: foundational networking (VPC/VNet design, transit gateway, DNS), identity management (SSO, federation, break-glass accounts), logging and monitoring (CloudTrail, Activity Log, audit logs).
5. Configure policy-as-code enforcement: AWS Config Rules / Azure Policy / GCP Organization Policies; guardrails for security, cost, and compliance; auto-remediation where possible.
6. Implement cost governance: budget alerts, cost allocation tags, showback/chargeback models, resource optimization recommendations, reserved instance management.
7. Define security baselines: CIS benchmarks, encryption requirements, network segmentation standards, IAM least privilege, VPC flow log requirements.
8. Establish compliance automation: continuous compliance monitoring, evidence collection, audit preparation, regulatory framework mapping (SOC 2, ISO 27001, HIPAA).
9. Create operational procedures: incident response for cloud, change management, backup/DR, access reviews, break-glass procedures.
10. Conduct quarterly governance reviews: policy compliance rates, cost optimization opportunities, security posture assessment, governance maturity scoring.

## Cloud Organization Structure

```
MULTI-ACCOUNT / MULTI-SUBSCRIPTION STRATEGY
=============================================

AWS MULTI-ACCOUNT STRUCTURE:

  Organization Root
  ├── OU: Management (AWS Control Tower managed)
  │   ├── Account: Management (organizations, SSO, billing)
  │   ├── Account: Logging (CloudTrail, VPC Flow Logs,GuardDuty central)
  │   └── Account: Audit (config, security hub, compliance)
  │
  ├── OU: Security Sandbox
  │   └── Account: Sandbox (developer experimentation; low quotas, no prod data)
  │
  ├── OU: Development
  │   ├── SCP: Restrict regions to us-east-1, us-west-2; limit instance types
  │   ├── Account: Dev-Team-A (application workloads)
  │   ├── Account: Dev-Team-B (application workloads)
  │   └── Account: Dev-Shared (shared services, CI/CD runners)
  │
  ├── OU: Staging
  │   ├── SCP: Same region restrictions; allow larger instances
  │   ├── Account: Staging-Prod-Mirror (production-like environment)
  │   └── Account: Staging-Performance (load testing)
  │
  ├── OU: Production
  │   ├── SCP: Restricted regions; require MFA for privileged actions; deny termination of prod resources
  │   ├── Account: Prod-App-Primary (primary application workload)
  │   ├── Account: Prod-App-Secondary (DR region failover)
  │   ├── Account: Prod-Data (databases, data warehouse, data lake)
  │   └── Account: Prod-Network (transit gateway, VPN, Direct Connect)
  │
  └── OU: Shared Services
      ├── Account: Networking (shared VPCs, transit, DNS)
      ├── Account: Identity (IAM Identity Center, SCIM provisioning)
      └── Account: Landing Zone (Control Tower, account factory)

AZURE MANAGEMENT GROUP HIERARCHY:

  Tenant Root Group
  ├── Management Group: Governance (policies, Blueprints applied here)
  │   ├── MG: Platform (baseline policies, RBAC, logging)
  │   │   ├── MG: Production
  │   │   │   ├── Subscription: Prod-App-1
  │   │   │   ├── Subscription: Prod-App-2
  │   │   │   └── Subscription: Prod-Data
  │   │   ├── MG: Non-Production
  │   │   │   ├── Subscription: Dev-Team-A
  │   │   │   ├── Subscription: Dev-Team-B
  │   │   │   └── Subscription: Staging
  │   │   └── MG: Corporate
  │   │       ├── Subscription: Landing-Zone
  │   │       └── Subscription: Management
  │   │
  │   └── MG: Compliance (HIPAA, SOC 2, PCI-DSS workloads)
  │       ├── Subscription: HIPAA-App
  │       └── Subscription: PCI-App

GCP FOLDER HIERARCHY:

  Organization
  ├── Folder: Environment-Production
  │   ├── Project: prod-app-us (primary region)
  │   ├── Project: prod-app-eu (EU region)
  │   └── Project: prod-data
  │
  ├── Folder: Environment-NonProd
  │   ├── Project: dev-team-a
  │   ├── Project: dev-team-b
  │   └── Project: staging
  │
  ├── Folder: Shared-Infrastructure
  │   ├── Project: shared-network (shared VPC host)
  │   ├── Project: shared-logging (central logging)
  │   └── Project: shared-identity (IAM, SSO)
  │
  └── Folder: Security-Sandbox
      └── Project: sandbox (restricted quotas, no external IP)

CROSS-CLOUD PRINCIPLES:
    → Separate accounts/subscriptions by environment (dev, staging, prod)
    → Separate accounts by team for cost allocation and autonomy
    → Centralized logging, identity, and networking in shared accounts
    → Production isolated with stricter policies (Service Control Policies / Azure Policy / Org Policies)
    → Billing: separate cost centers per account; consolidated billing at root
```

## Resource Tagging Standards

```
TAGGING POLICY AND ENFORCEMENT
================================

MANDATORY TAGS:

  All Resources (enforced via policy):
    → Environment: [production | staging | development | sandbox]
    → CostCenter: [department code, e.g., ENG-001, MKT-003]
    → Owner: [team email or Slack channel, e.g., platform-team@company.com]
    → Project: [project name or JIRA epic, e.g., proj-checkout-redesign]
    → ManagedBy: [terraform | cloudformation | pulumi | portal | cli]

  Compute Resources (EC2, VMs, Instances):
    → Application: [application name, e.g., api-gateway, payment-service]
    → Tier: [frontend | backend | database | cache | worker]
    → AutoScaling: [true | false]
    → BackupSchedule: [daily | weekly | none]

  Storage Resources (S3, Blob, Cloud Storage, EBS, Disks):
    → DataClassification: [public | internal | confidential | restricted]
    → RetentionPeriod: [30d | 90d | 1y | 3y | 7y | indefinite]
    → Encryption: [kms | service-managed | customer-managed]

  Network Resources (VPC, Subnets, Load Balancers, Firewalls):
    → NetworkTier: [public | private | management | data | dmz]
    → SecurityZone: [high-trust | medium-trust | low-trust | untrusted]

  Database Resources:
    → DatabaseType: [postgresql | mysql | mongodb | redis | elasticsearch]
    → HARequired: [true | false]
    → BackupRetention: [7d | 30d | 90d]

ENFORCEMENT MECHANISMS:

  AWS:
    → AWS Config Rule: tag-required (evaluates every resource creation/modification)
    → Auto-remediation: Lambda function adds default tags or triggers non-compliance alert
    → Service Control Policy: Deny resource creation without required tags (via IAM condition)
    → CDK/CloudFormation: Stack tags inherited by all resources (inherited tags feature)
    → Terraform: terraform-plugin-validate-tags pre-commit hook

  Azure:
    → Azure Policy: "Require tags on resources" (built-in policy definition)
    → Effect: deny (block creation) or append (add default tags)
    → Initiative: Group multiple tag policies into governance initiative
    → Initiate remediation task for existing non-compliant resources

  GCP:
    → Organization Policy: Custom constraint for required labels
    → Policy: "constraints/resourcemanager.requiredLabels"
    → Enforcement: Enforce at folder level; all projects inherit
    → Validation: Forseti Security for compliance auditing

TAGGING COMPLIANCE METRICS:
    → Target: 98%+ resource compliance with tagging policy
    → Weekly audit: Automated scan of all resources for tag compliance
    → Monthly report: Resources missing tags → owner notified via email
    → Quarterly cleanup: Untagged resources → "unclaimed" tag → cost allocation review
    → Cost tracking: 100% of cloud spend allocated to cost centers via tags
```

## Policy-as-Code Guardrails

```
POLICY-AS-CODE FRAMEWORK
==========================

SECURITY GUARDRAILS (MANDATORY — VIOLATIONS BLOCKED):

  Identity and Access:
    → Root user MFA: Must be enabled on all accounts/subscriptions (enforced via SCP/Azure Policy)
    → Password policy: Minimum 14 characters, complexity required, 90-day rotation
    → Break-glass accounts: Maximum 2 per account; usage logged and alerted
    → IAM policy: No wildcard (*) actions; no wildcard resources except in specific cases
    → Standing privileges: Production access requires JIT (just-in-time) approval
    → External account sharing: Disabled; use IAM Identity Center / Enterprise App

  Network Security:
    → Security groups/firewall: No 0.0.0.0/0 inbound on SSH (22), RDP (3389), database ports
    → VPC/VNet: All production resources in private subnets; NAT for outbound only
    → Flow logs: Enabled for all production VPCs/VNets; 90-day retention minimum
    → DNS: Private DNS zones for internal resolution; no public DNS for internal services
    → Internet gateways: Require approval for production IGW creation

  Data Protection:
    → Encryption at rest: All storage encrypted (EBS, S3, RDS, EBS, disk, blob)
    → Encryption in transit: TLS 1.2+ required; no SSLv3/TLS 1.0/1.1
    → KMS keys: Customer-managed keys for production; automatic rotation enabled
    → Public access: S3 buckets, blob containers default to private; public access blocked at account level
    → Backup: Automated backups for all production databases and storage; cross-region replication

  Compute Security:
    → Instance metadata: IMDSv2 required (AWS); managed identity for Azure
    → EBS encryption: Default encryption enabled at account level
    → Container images: Scanned for vulnerabilities before deployment (Trivy, Clair)
    → Container runtime: No privileged containers in production
    → Secrets: No secrets in environment variables; use Secrets Manager / Key Vault / Secret Manager

COST GUARDRAILS:

  Budget Controls:
    → Budget alerts: 50%, 75%, 90%, 100% of monthly budget (Slack + email + PagerDuty at 100%)
    → Instance type restrictions: No p4d.24xlarge / H100 instances without finance approval
    → Region restrictions: Resources only in approved regions (us-east-1, us-west-2, eu-west-1)
    → Idle resource detection: Auto-flag EC2/VMs with <5% CPU for 7+ days
    → Unattached volumes: Auto-delete EBS/disk volumes unattached for 14+ days
    → Snapshots: Limit to 30 retained snapshots per volume; lifecycle policy enforced

  Reserved Commitment Management:
    → RI/Savings Plan coverage target: 70%+ of baseline compute spend
    → Expiring commitments: Alert 60, 30, 14 days before expiration
    → Utilization monitoring: RIs with <80% utilization → flag for adjustment
    → Purchase approval: RI purchases require finance + engineering sign-off

COMPLIANCE GUARDRAILS:

  SOC 2 / ISO 27001:
    → Logging: CloudTrail/Activity Log enabled; log integrity protection; 1-year retention
    → Change management: IAM policy changes logged; config change notifications
    → Access review: Quarterly access certification; unused access removed
    → Encryption compliance: Continuous scan for unencrypted resources
    → Vulnerability management: Quarterly vulnerability scan; critical patched within 7 days

  HIPAA (if applicable):
    → BAA: AWS/Azure BAA enabled for HIPAA-covered services
    → Audit logs: 6-year retention for PHI-related systems
    → Access controls: Role-based access; minimum necessary access principle
    → Encryption: AES-256 at rest; TLS 1.2+ in transit
    → Breach detection: GuardDuty / Microsoft Defender for Cloud enabled

POLICY ENFORCEMENT TOOLS:
    → AWS: Service Control Policies (SCP), AWS Config, AWS Policy Generator, Control Tower guardrails
    → Azure: Azure Policy, Blueprints, Management Group policies, Deny-Arm-Templates
    → GCP: Organization Policies, Forseti Security, Security Command Center
    → Cross-cloud: Open Policy Agent (OPA), HashiCorp Sentinel, Checkov, Terrascan
```

## Cloud Center of Excellence (CCoE)

```
CLOUD CENTER OF EXCELLENCE FRAMEWORK
======================================

CCOE STRUCTURE:

  Core Team:
    → Cloud Architect (lead): Strategy, standards, hands-on design review
    → Security Engineer: Security baselines, compliance automation, threat modeling
    → DevOps Engineer: Landing zone, CI/CD, IaC standards, platform services
    → FinOps Analyst: Cost optimization, budgeting, chargeback/showback
    → Program Manager: Governance processes, stakeholder communication, training

  Extended Team (as needed):
    → Application engineers (representative from each major team)
    → Data engineer (data platform standards)
    → Network engineer (hybrid connectivity, transit design)
    → Compliance officer (regulatory requirements mapping)

CCOE RESPONSIBILITIES:

  1. Strategy and Roadmap:
     → Cloud adoption roadmap (6-18 month horizon)
     → Multi-cloud strategy (if applicable)
     → Migration methodology (rehost, refactor, rebuild, replace)
     → Technology selection (PaaS services vs. self-managed)

  2. Standards and Patterns:
     → Reference architectures (web app, microservices, data lake, batch processing)
     → Well-Architected reviews (AWS/Azure/GCP framework)
     → IaC modules and templates (Terraform registry, Bicep modules)
     → CI/CD pipeline templates (GitHub Actions, GitLab CI, Azure DevOps)
     → Logging/monitoring standards (centralized logging, alerting thresholds)

  3. Enablement and Training:
     → Cloud onboarding for new teams (2-week program)
     → Monthly architecture review board (ARB) meetings
     → Quarterly cloud skills training (workshops, certification support)
     → Internal documentation portal (runbooks, architecture decisions, best practices)
     → Office hours: Weekly drop-in session for cloud questions

  4. Governance and Compliance:
     → Policy creation and enforcement (policy-as-code)
     → Compliance monitoring and reporting
     → Cost governance (budgets, alerts, chargeback)
     → Security posture management (CIS score, security hub findings)
     → Incident response coordination for cloud events

CCOE MASTERY LEVELS:

  Level 1 — Foundation (Months 1-3):
    → Landing zone deployed
    → Basic identity and access management
    → Centralized logging enabled
    → Billing alerts configured
    → Tagging policy established

  Level 2 — Standardization (Months 4-6):
    → Policy-as-code enforcement active
    → Cost allocation and showback running
    → Security baselines automated
    → CI/CD pipeline templates available
    → Self-service portal for common resources

  Level 3 — Optimization (Months 7-12):
    → Automated compliance reporting
    → FinOps culture established (cost ownership by teams)
    → Well-Architected reviews for all production workloads
    → Self-healing infrastructure (auto-remediation for common issues)
    → Cost optimization automation (rightsizing, RI management)

  Level 4 — Innovation (Months 12+):
    → AI/ML-driven operations (predictive scaling, anomaly detection)
    → Multi-region/multi-cloud active-active deployments
    → Zero-trust architecture fully implemented
    → Automated security patching and compliance
    → Platform engineering: internal developer platform (IDP)
```

## Integration Points

- **AWS Control Tower**: Landing zone automation; guardrails (preventive and detective); management account setup; organizational structure; CloudShell access; $0 (uses existing AWS services)
- **Azure Landing Zones (Enterprise-Scale)**: Reference architecture; Azure Blueprint deployment; management group hierarchy; policy assignment; landing zone accelerator on GitHub
- **Google Cloud Deployment Manager / Terraform**: Landing zone deployment; folder hierarchy; organization policies; shared VPC; central logging
- **Open Policy Agent (OPA)**: Cross-cloud policy engine; Rego language for policy definition; integrates with Kubernetes, Terraform, CI/CD
- **HashiCorp Sentinel**: Policy-as-code for Terraform; soft-mandatory (warn) and hard-mandatory (enforce) policies; integrates with Terraform Cloud/Enterprise
- **Checkov / Terrascan**: IaC security scanning; pre-commit hooks; CI/CD integration; supports Terraform, CloudFormation, Kubernetes, ARM
- **CloudHealth by VMware / Apptio**: Multi-cloud cost management; optimization recommendations; budget tracking; anomaly detection; chargeback/showback
- **Wiz / Lacework / Prisma Cloud**: Cloud security posture management; continuous compliance; vulnerability management; cloud-native workload protection

## Edge Cases

- **Multi-cloud governance**: Different policy languages per provider; unified dashboard required; consider cross-cloud policy engines (OPA, custom); separate landing zones per cloud; federated identity across clouds
- **Government cloud (AWS GovCloud, Azure Government, GCP FedRAMP)**: Different region availability; customer-managed encryption keys required; SC-APP or IL5/IL6 certification; enhanced auditing; personnel with clearance for support
- **Startups (minimal governance initially)**: Start with basics: single account with team separation, budget alerts, MFA enforcement, basic logging; add governance as team and complexity grow; avoid over-engineering
- **Highly regulated workloads in same account**: Use separate accounts/subscriptions for regulated workloads; enforce stricter SCPs/policies; dedicate compliance team; separate logging and audit trails
- **Self-service vs. control tension**: Too strict = teams bypass governance (shadow IT); too loose = security/cost risks; solution: "guardrails not gateways" — enable safe self-service with automated policy enforcement and clear documentation
- **Legacy on-prem + cloud (hybrid)**: Hybrid connectivity governance (Direct Connect/ExpressRoute); consistent identity (AD sync to cloud IAM); consistent security policies across on-prem and cloud; unified monitoring
- **Cost governance pushback**: Engineering teams resist cost controls; solution: educate with FinOps principles; showback before chargeback; team-level budgets with autonomy within budget; celebrate savings
