---
name: configuration-management-dcm
description: Manage and control IT infrastructure configurations using configuration management databases (CMDB), automated configuration tools, and change tracking. Use when implementing CMDB, automating configuration drift detection, managing baseline configurations, tracking configuration changes, enforcing configuration standards, or integrating configuration data with ITSM workflows. Triggers on phrases like "configuration management", "CMDB", "configuration drift", "baseline configuration", "configuration item", "config management", "Ansible", "Puppet", "Chef", "configuration audit", "configuration baseline", "drift detection", "infrastructure configuration".
---

# Configuration Management & Desired Configuration Management (DCM)

Establish and maintain consistent, compliant infrastructure configurations across on-premises and cloud environments using automated configuration management tools and centralized configuration databases.

## Workflow

1. Define configuration management scope: identify all configuration items (CIs) to manage — servers, network devices, databases, applications, middleware, cloud resources, endpoints.
2. Select configuration management tools based on environment: Ansible (agentless, broad support), Puppet (agent-based, large scale), Chef (Ruby-based, developer-friendly), SaltStack (event-driven, high performance).
3. Build CMDB (Configuration Management Database): discover CIs automatically via agents/APIs; define CI relationships and dependencies; establish data ownership and quality standards.
4. Establish baseline configurations: OS hardening baselines (CIS benchmarks), application configuration standards, network device templates, cloud resource defaults.
5. Implement automated configuration enforcement: schedule regular runs (daily/weekly); detect and remediate drift; report non-compliant configurations; integrate with change management.
6. Integrate with ITSM workflows: link configuration changes to change requests; impact analysis using CMDB relationships; automated change validation.
7. Monitor configuration drift: continuous comparison of actual state vs. desired state; alert on unauthorized changes; trend analysis of drift frequency.
8. Conduct quarterly configuration audits: validate CMDB accuracy, review baseline compliance, assess configuration risk, verify CI relationships.
9. Maintain configuration documentation: architecture diagrams, dependency maps, runbooks tied to CI configurations.
10. Report on configuration health: compliance scores, drift metrics, change success rates, configuration-related incidents.

## Configuration Management Tool Selection

```
CONFIGURATION MANAGEMENT TOOLS COMPARISON
============================================

ANSIBLE:

  Architecture: Agentless (SSH for Linux, WinRM for Windows); push model
  Language: YAML playbooks; Jinja2 templates; Python modules
  Inventory: Static (INI/YAML) or dynamic (AWS EC2, Azure, custom scripts)
  Idempotency: Built-in (tasks only execute if state change needed)

  Strengths:
    → Zero agent deployment; works over SSH/WinRM
    → Low learning curve (YAML is human-readable)
    → Large module library (6,000+ community modules)
    → Strong ad-hoc command capability (ansible all -m ping)
    → Role-based organization (ansible-galaxy roles)

  Weaknesses:
    → Performance degrades with 2,000+ hosts (serial execution limits)
    → No real-time state tracking (pull model not native)
    → Complex workflows require custom modules or ansible-runner

  Best for: Small to medium environments (up to 2,000 hosts); heterogeneous environments; teams new to configuration management
  Licensing: Ansible Engine free (GPL); Ansible Automation Platform $6,500/year/controller + $475/host/year

PUPPET:

  Architecture: Agent-master (client-server); pull model with push capability (Puppet Bolt)
  Language: Puppet DSL (domain-specific); supports Hiera for data separation
  Inventory: Puppet DB stores node state; automated discovery via Puppet Dashboard

  Strengths:
    → Real-time state tracking (agents report state every 30 minutes)
    → Scalable to 50,000+ nodes (proven at large enterprises)
    → Puppet Forge: 6,000+ community modules
    → Declarative language (state-based, not procedure-based)
    → Strong reporting and compliance features (Puppet Enterprise)

  Weaknesses:
    → Higher complexity (agent deployment, master management, Puppet DB)
    → Slower iteration cycle (agent wait for next run)
    → Puppet Enterprise licensing cost significant at scale

  Best for: Large enterprise environments (5,000+ nodes); environments requiring real-time compliance reporting; organizations wanting declarative approach
  Licensing: Puppet Open Source free; Puppet Enterprise $4,500/year/year + $150/node/year

CHEF:

  Architecture: Agent-server (client-server); push/pull hybrid (Chef Push Jobs)
  Language: Ruby (cookbooks, recipes); supports Policyfiles
  Inventory: Chef Server stores node state, cookbooks, environments

  Strengths:
    → Code-centric (appeals to developers; Ruby is full programming language)
    → Chef Supermarket: 6,000+ community cookbooks
    → Strong testing framework (Test Kitchen, InSpec)
    → Chef InSpec for compliance as code
    → Flexible: imperative style allows complex logic

  Weaknesses:
    → Ruby dependency (higher learning curve for non-developers)
    → Chef Server management overhead
    → Imperative style can lead to non-idempotent recipes if not careful

  Best for: Developer-centric teams; organizations already using Ruby; environments needing strong compliance testing (InSpec)
  Licensing: Chef Workstation free; Chef Automate $5,000/year + $150/node/year

SALTSTACK (SALT):

  Architecture: Master-minion; supports push, pull, and reactive (event-driven)
  Language: YAML state files; Python modules; Jinja2 templates
  Inventory: Master tracks minion state; supports external job cache

  Strengths:
    → Extremely fast (ZeroMQ messaging; handles 100,000+ nodes)
    → Event-driven architecture (reactive configuration changes)
    → Salt SSH for agentless mode
    → Strong orchestration capabilities (compound commands)
    → Low resource footprint on minions

  Weaknesses:
    → Smaller community and module ecosystem vs. Ansible/Puppet
    → Less mature reporting and compliance features
    → Smaller enterprise adoption

  Best for: Very large environments (10,000+ nodes); environments needing real-time reactive configuration; high-performance requirements
  Licensing: Salt Project free; Salt Enterprise $5,000/year + $125/node/year

TOOL SELECTION MATRIX:

  Factor                  | Ansible  | Puppet   | Chef     | Salt
  ────────────────────────|──────────|──────────|──────────|────────
  Ease of Learning        | ★★★★★  | ★★★☆☆  | ★★★☆☆  | ★★★★☆
  Scalability             | ★★★☆☆  | ★★★★★  | ★★★★☆  | ★★★★★
  Agentless Option        | Yes      | Limited  | No       | Yes (SSH)
  Compliance Reporting    | ★★★☆☆  | ★★★★★  | ★★★★☆  | ★★★☆☆
  Community Size          | ★★★★★  | ★★★★☆  | ★★★★☆  | ★★★☆☆
  Cost (per node)         | $0-Pro   | $150/yr  | $150/yr  | $125/yr
  Speed                   | ★★★☆☆  | ★★★☆☆  | ★★★★☆  | ★★★★★
```

## Configuration Baseline Standards

```
OPERATING SYSTEM BASELINE CONFIGURATIONS
==========================================

LINUX BASELINE (RHEL 8/9, Ubuntu 22.04, Amazon Linux 2023):

  System Hardening (CIS Level 1 aligned):
    → Kernel parameters (sysctl.conf):
       net.ipv4.ip_forward = 0
       net.ipv4.conf.all.accept_redirects = 0
       net.ipv4.conf.all.send_redirects = 0
       net.ipv4.conf.all.accept_source_route = 0
       net.ipv4.tcp_syncookies = 1
       kernel.randomize_va_space = 2
       fs.suid_dumpable = 0

    → SSH Configuration (/etc/ssh/sshd_config):
       Protocol 2
       PermitRootLogin no
       PasswordAuthentication no
       PubkeyAuthentication yes
       MaxAuthTries 3
       ClientAliveInterval 300
       ClientAliveCountMax 2
       AllowGroups ssh-users
       X11Forwarding no

    → User Account Policies:
       Password minimum length: 14 characters
       Password complexity: require upper, lower, digit, special
       Password expiration: 90 days
       Account lockout: 5 failed attempts; 30-minute lockout
       Maximum login sessions: 10 per user
       Root access: via sudo only; root login disabled

    → File System Security:
       /tmp: noexec,nosuid,nodev mount options
       /dev/shm: noexec,nosuid,nodev mount options
       World-writable directories: audited and approved only
       SUID/SGID binaries: baseline established; deviations alert

    → Logging (rsyslog/journald):
       Auditd enabled: all privilege escalation, file access, network changes
       Log retention: 90 days minimum on host; forwarded to central SIEM
       Syslog: forwarded to central SIEM server (TLS-encrypted)
       Log format: JSON for SIEM parsing compatibility

    → Package Management:
       Unnecessary packages removed: telnet, rsh, ftp, talk, xinetd
       Security patches: applied within 7 days (critical), 30 days (all)
       Package signing: verify GPG signatures on all packages
       APT/YUM repos: only approved internal mirrors

    → Firewall (firewalld/ufw):
       Default policy: DENY all inbound; ALLOW all outbound
       Allowed inbound: SSH (22) from management subnet only
       Logging: denied connections logged
       Zone-based: separate zones for management, application, database

WINDOWS SERVER BASELINE (Server 2019/2022):

  System Hardening (CIS Level 1 aligned):
    → Local Security Policy:
       Password complexity: enabled; minimum 14 characters
       Password age: maximum 90 days; minimum 1 day
       Account lockout: 5 attempts; 30-minute duration; 30-minute reset
       LSA protection: Run as PPL enabled
       UAC: Level 4 (always notify)

    → Windows Defender:
       Real-time protection: enabled
       Cloud-delivered protection: enabled
       Automatic sample submission: enabled
       Exclusions: only documented and approved paths
       Tamper protection: enabled
       Update schedule: daily

    → Event Log Configuration:
       Security log: 32,768 MB; overwrite as needed; forward to SIEM
       System log: 4,096 MB; overwrite as needed
       Application log: 4,096 MB; overwrite as needed
       Audit policy: logon/logoff, privilege use, object access, policy change

    → Network Security:
       Windows Firewall: enabled on all profiles (Domain, Private, Public)
       SMB: SMBv1 disabled; SMB signing required
       WinRM: HTTPS only; authorized listeners only
       Remote Desktop: Network Level Authentication required; restricted to admin group

    → PowerShell:
       Execution Policy: RemoteSigned (or AllSigned for production)
       Script Block Logging: enabled
       Module Logging: enabled for all modules
       Transcript logging: enabled for administrative sessions
```

## CMDB Design and Implementation

```
CONFIGURATION MANAGEMENT DATABASE (CMDB) DESIGN
=================================================

CI CATEGORIES AND ATTRIBUTES:

  Server CI:
    → Identifier: hostname, IP address, asset tag
    → Type: physical, virtual, container, cloud instance
    → OS: name, version, architecture (x86_64, ARM64)
    → Hardware: CPU cores, RAM, storage, model (if physical)
    → Environment: production, staging, development
    → Owner: team, contact email
    → Location: data center, rack, U position (if physical)
    → Lifecycle: procured date, warranty expiry, planned retirement
    → Relationships: hosted applications, connected networks, dependent services

  Application CI:
    → Identifier: application name, version, deployment ID
    → Type: web app, API, batch job, microservice, database
    → Technology: framework, language, runtime
    → Deployment: servers/containers running the app
    → Dependencies: upstream applications, downstream consumers, databases
    → Support: support team, SLA tier, on-call schedule
    → Configuration: config file locations, environment variables, secrets location

  Network Device CI:
    → Identifier: hostname, serial number, IP address (management)
    → Type: router, switch, firewall, load balancer, wireless AP
    → Model: manufacturer, model number, firmware version
    → Configuration: config file hash, last change date
    → Connectivity: upstream links, downstream links, VLANs
    → Relationships: connected servers, protected segments

  Database CI:
    → Identifier: instance name, connection string (masked)
    → Type: RDBMS (PostgreSQL, MySQL, SQL Server), NoSQL (MongoDB, Redis), data warehouse
    → Version: database engine version, patch level
    → Size: current size, growth rate, storage allocated
    → Performance: max connections, current connections, avg query time
    → Backup: backup schedule, retention, last successful backup
    → Relationships: dependent applications, replication partners

CMDB RELATIONSHIP MODEL:

  → Supports: Server CI supports Application CI
  → Depends On: Application CI depends on Database CI
  → Connects To: Network Device CI connects to Server CI
  → Part Of: Subnet CI is part of VPC/VNet CI
  → Provides: Database CI provides data to Application CI
  → Monitored By: Monitoring CI monitors Server CI

CMDB POPULATION METHODS:

  Automated Discovery:
    → Agent-based: agents report CI attributes (Puppet, Chef, Ansible Tower, custom agent)
    → Agentless: API-based discovery (AWS Config, Azure Resource Graph, GCP Asset Inventory)
    → Network scan: SNMP, WMI, SSH probes for on-prem discovery
    → Cloud provider APIs: automatic CI creation for new cloud resources
    → Container orchestration: Kubernetes API for pod/service/deployment tracking

  Manual Entry:
    → New CI templates in ServiceNow/ITSM
    → Approval workflow for CI creation
    → Data validation rules (required fields, format checks)
    → Change request links for CI modifications

CMDB DATA QUALITY:

  Accuracy target: 95%+ CI attributes accurate (validated quarterly)
  Completeness target: 100% of in-scope infrastructure in CMDB
  Timeliness: CI updates within 4 hours of change
  Reconciliation: monthly automated reconciliation vs. actual infrastructure
  Ownership: each CI has assigned owner; orphaned CIs flagged monthly

CMDB PLATFORMS:
    → ServiceNow CMDB: Enterprise CMDB; automated discovery (ServiceNow Discovery module); relationship mapping; $15,000+/year base + per-CI pricing
    → BMC Helix CMDB: Enterprise; discovery integration; ITIL alignment
    → BigPanda / Moogsoft: Event-driven CMDB; real-time CI updates from monitoring
    → Custom: Terraform state + cloud provider APIs + Elasticsearch (for lean organizations)
```

## Configuration Drift Management

```
CONFIGURATION DRIFT DETECTION AND REMEDIATION
===============================================

DRIFT TYPES:

  1. Unauthorized Changes:
     → Admin manually modifies config file via SSH
     → Developer changes settings directly on production server
     → Misconfigured backup alters system settings
     → Response: Alert immediately; revert to baseline; investigate root cause

  2. Gradual Drift:
     → Package updates change default configurations
     → Application self-modifies config files (auto-tuning)
     → Log rotation changes file permissions over time
     → Response: Detected in scheduled compliance scans; auto-remediated

  3. Intentional but Undocumented Changes:
     → Emergency fix applied outside change management process
     → Temporary change not reverted after incident resolution
     → Response: Flag for documentation; backfill change request; add to audit trail

  4. Cloud-Specific Drift:
     → Developer modifies security group via console (not via Terraform)
     → Auto-scaling launches instances without proper configuration
     → AWS Config rules detect non-compliant resources
     → Response: Auto-remediation Lambda; console modifications restricted via SCP

DRIFT DETECTION CONFIGURATION:

  Ansible:
    → ansible-lint: Validate playbooks for drift-prone patterns
    → --check mode: Dry-run to detect configuration differences
    → Scheduled runs: Cron job runs playbooks nightly in check mode
    → Report: Generates drift report (hosts with configuration differences)

  Puppet:
    → Agents report state every 30 minutes; master detects drift
    → Puppet Dashboard/Console: Real-time compliance dashboard
    → Auto-correct: Agents apply catalog on next run (self-healing)
    → Reporting: Compliance reports generated per node per run

  Chef:
    → chef-client runs (scheduled via cron/Puppet): detect and correct drift
    → Chef Compliance/InSpec: Audit profiles run against nodes
    → Reporting: Chef Automate dashboards show compliance scores

  Cloud (AWS Config / Azure Policy / GCP):
    → AWS Config Rules: Continuous evaluation; custom Lambda rules
    → Azure Policy: Continuous assessment; auto-remediation tasks
    → GCP Security Command Center: Continuous compliance monitoring
    → Alerting: SNS/Logic Apps/PubSub notifications on non-compliance

DRIFT RESPONSE PROCEDURES:

  Automated Remediation:
    → Low-risk drift: Auto-correct on next configuration management run
    → Example: File permission drift, missing packages, config file changes
    → Audit log: Before/after state recorded for compliance

  Manual Review Required:
    → Medium-risk drift: Alert configuration management team; review within 4 hours
    → Example: Kernel parameter changes, firewall rule additions, user account changes
    → Decision: Accept change (update baseline) or revert (auto-remediate)

  Immediate Action:
    → High-risk drift: Alert security team; contain within 1 hour
    → Example: SSH config weakened, firewall rules opened to 0.0.0.0/0, new admin accounts
    → Response: Revert immediately; lock affected system if needed; incident investigation
```

## Integration Points

- **ServiceNow CMDB**: Enterprise configuration database; automated discovery; CI relationship mapping; change management integration; $15K+/year base
- **Ansible Automation Platform**: Agentless configuration management; role-based automation; tower for scheduling and reporting; Red Hat certified content
- **Puppet Enterprise**: Agent-based configuration management; real-time compliance reporting; Puppet Bolt for push operations; Puppet Forge modules
- **Chef Automate**: Ruby-based configuration management; Chef InSpec compliance testing; Policyfiles for policy-as-code; Chef Supermarket
- **AWS Config / Azure Policy**: Cloud-native configuration compliance; continuous monitoring; auto-remediation; integration with cloud control tools
- **HashiCorp Consul**: Service discovery and configuration management for microservices; KV store for dynamic configuration; health checking
- **Icinga / Nagios**: Infrastructure monitoring with configuration validation plugins; alerting on configuration anomalies
- **Terraform**: Infrastructure as code; state file serves as source of truth; drift detection via terraform plan; automated remediation via terraform apply

## Edge Cases

- **Legacy systems unable to run agents**: Use agentless tools (Ansible SSH, Salt SSH); SNMP-based discovery for network devices; manual CI entry with quarterly validation
- **Containerized ephemeral environments**: CMDB must track short-lived containers; Kubernetes API as source of truth; link pods to long-lived CI (deployment/service); retention policies for CI history
- **Multi-cloud CMDB unification**: Different APIs per provider; custom aggregator service; common CI schema across providers; single pane of glass dashboard
- **High-frequency configuration changes (canary deployments)**: Distinguish between "drift" and "intended state change"; integrate with deployment pipeline; configuration management aware of deployment cycles
- **Regulated environments requiring change approval**: Every configuration change tied to change request; CMDB update triggers compliance report; audit trail maintained for 7+ years
- **Configuration management performance at scale**: Ansible with 3,000+ hosts → use strategies (parallelism, serial execution groups, delegate_to); Puppet with 50,000+ nodes → compile masters, caching
- **Conflicting configuration sources**: Terraform manages infrastructure, Ansible manages OS config → establish clear boundaries; Terraform = infrastructure provisioning, Ansible = configuration; document in runbooks
