---
name: incident-response-management
description: Manage security and operational incidents through structured response processes including detection, triage, containment, eradication, recovery, and post-incident analysis. Use when developing incident response plans, conducting tabletop exercises, managing incident communication, coordinating cross-functional response teams, performing root cause analysis, maintaining incident playbooks, or improving incident response maturity. Triggers on phrases like "incident response", "IR plan", "incident management", "incident response team", "IRT", "incident commander", "tabletop exercise", "playbook", "runbook", "root cause analysis", "RCAs", "post-incident review", "war room", "incident severity", "incident escalation".
---

# Incident Response Management

Structured approach to detecting, responding to, and recovering from security and operational incidents with defined roles, playbooks, communication procedures, and continuous improvement processes.

## Workflow

1. Establish Incident Response Team (IRT): define core team (CISO, security engineers, IT ops, legal, communications), expandable team (external IR firm, law enforcement, regulatory contacts), and on-call rotation.
2. Develop incident response plan: incident classification and severity levels, roles and responsibilities, escalation matrices, communication templates, legal/regulatory notification requirements.
3. Create incident playbooks for common scenarios: ransomware, data breach, DDoS, insider threat, cloud compromise, supply chain attack, business email compromise (BEC), credential compromise.
4. Implement detection and alerting: SIEM correlation rules, EDR alerts, network IDS/IPS, cloud security alerts, threat intelligence feeds, user-reported incidents.
5. Establish incident command structure: incident commander (IC), technical lead, communications lead, scribe; bridge call setup; status tracking tool.
6. Define severity levels and response SLAs: P1 (critical — 15-minute response), P2 (high — 30 minutes), P3 (medium — 2 hours), P4 (low — 24 hours).
7. Conduct quarterly tabletop exercises: realistic scenarios, cross-functional participation, after-action review, playbook updates.
8. Maintain incident response tooling: forensic workstations, memory analysis tools, network capture tools, log access, ticketing system, communication channels.
9. Perform post-incident reviews: root cause analysis (5-Whys, fishbone), timeline reconstruction, lessons learned, remediation tracking, metric reporting.
10. Report to leadership and board: quarterly incident reports, trend analysis, security posture updates, investment recommendations.

## Incident Classification and Severity

```
INCIDENT SEVERITY CLASSIFICATION
==================================

SEVERITY 1 — CRITICAL (P1):

  Criteria (ANY of the following):
    → Active data exfiltration or confirmed data breach (PII, PHI, financial data)
    → Ransomware or destructive malware affecting production systems
    → Complete outage of revenue-critical system (e-commerce, payment processing)
    → Compromised domain admin / root credentials with active lateral movement
    → Active attack on customer infrastructure (DDoS causing service disruption)
    → Executive credential compromise (CEO fraud / BEC in progress)
    → Regulatory-mandated notification event detected

  Response SLA:
    → Detection to acknowledgment: 15 minutes
    → Incident commander assigned: 30 minutes
    → Bridge call activated: 30 minutes
    → Containment action: 1 hour
    → Executive notification: 1 hour
    → Regulatory notification (if required): 72 hours (GDPR), per regulation

  Response Team:
    → Incident Commander: CISO or delegate
    → Technical Lead: Senior security engineer
    → Communications Lead: Corporate communications / PR
    → Legal Counsel: In-house + external (if data breach)
    → Full IRT on call; external IR firm on standby

SEVERITY 2 — HIGH (P2):

  Criteria (ANY of the following):
    → Suspected data breach (unconfirmed)
    → Malware detected on multiple systems (not ransomware)
    → Unauthorized access to sensitive system (access restricted, not confirmed exfiltration)
    → Significant system degradation affecting customer experience
    → Insider threat indicator (unusual data access patterns by employee)
    → Phishing campaign with multiple successful credential submissions
    → Vulnerability actively exploited in the wild affecting your systems

  Response SLA:
    → Detection to acknowledgment: 30 minutes
    → Incident commander assigned: 1 hour
    → Bridge call activated: 1 hour
    → Containment action: 4 hours
    → Executive notification: 4 hours

  Response Team:
    → Incident Commander: Security manager or senior engineer
    → Technical Lead: Security engineer
    → IT Operations Lead: Systems engineer
    → Legal: Notified if potential data breach

SEVERITY 3 — MEDIUM (P3):

  Criteria (ANY of the following):
    → Single system compromised (contained, no lateral movement)
    → Policy violation with moderate risk (unauthorized software, shadow IT)
    → Attempted attack blocked by security controls (failed exploitation)
    → Non-critical system outage (internal tools, non-customer-facing)
    → Suspicious activity under investigation (unconfirmed)

  Response SLA:
    → Detection to acknowledgment: 2 hours
    → Investigation and containment: 8 hours (business hours)
    → Resolution target: 48 hours

  Response Team:
    → Incident Commander: Security analyst (senior)
    → Technical Lead: Assigned security analyst
    → IT Operations: As needed

SEVERITY 4 — LOW (P4):

  Criteria (ANY of the following):
    → False positive alerts (confirmed after investigation)
    → Minor policy violations (non-security-impactful)
    → Informational security events (port scans from external sources)
    → Single phishing email reported (no interaction)

  Response SLA:
    → Detection to acknowledgment: 24 hours
    → Investigation and resolution: 5 business days
    → Documentation: Within resolution

  Response Team:
    → Assigned to security analyst queue
    → No bridge call required
    → Resolution tracked in ticketing system
```

## Incident Command Structure

```
INCIDENT COMMAND STRUCTURE (NIMS-ALIGNED)
============================================

INCIDENT COMMANDER (IC):

  Role: Overall incident leadership and decision-making
  Responsibilities:
    → Declares incident severity and activates appropriate response
    → Sets incident objectives and priorities
    → Authorizes containment, eradication, and recovery actions
    → Manages resource allocation (personnel, tools, budget)
    → Primary contact for executive leadership
    → Approves external communications (customer notifications, regulatory filings)
    → Declares incident closed

  Qualifications:
    → P1: CISO, VP of Security, or designated senior leader
    → P2: Security Manager or senior security engineer
    → P3: Senior security analyst
    → P4: Security analyst (on-call)

  Cannot simultaneously serve as: technical lead or scribe (separation of duties)

TECHNICAL LEAD:

  Role: Technical investigation and response execution
  Responsibilities:
    → Leads forensic investigation and evidence collection
    → Directs technical containment and eradication actions
    → Coordinates with IT operations for system access and changes
    → Manages external IR firm (if engaged)
    → Provides technical status updates to IC every 30-60 minutes
    → Validates eradication and recovery readiness

  Team:
    → Security engineers (2-4 for P1, 1-2 for P2/P3)
    → IT operations engineers (as needed for system changes)
    → Forensic analyst (for data breach incidents)

COMMUNICATIONS LEAD:

  Role: All internal and external communications
  Responsibilities:
    → Drafts internal notifications (employees, leadership)
    → Drafts external communications (customers, partners, media)
    → Manages stakeholder updates (on schedule: every 2 hours for P1, every 4 hours for P2)
    → Coordinates with legal for regulatory notifications
    → Manages press inquiries and media relations
    → Maintains communication log (who was told what and when)

  Templates Pre-Prepared:
    → Employee notification (data breach, system outage, phishing alert)
    → Customer notification (service impact, data breach)
    → Regulatory notification (GDPR Art. 34, state breach notification)
    → Executive briefing template
    → Press statement template

SCRIBE / DOCUMENTATION LEAD:

  Role: Incident documentation and timeline maintenance
  Responsibilities:
    → Maintains real-time incident timeline (actions, decisions, findings)
    → Records all communications (decisions, approvals, notifications)
    → Tracks open action items and ownership
    → Documents evidence chain of custody
    → Compiles post-incident review materials

  Tools:
    → Shared document (Google Docs / Confluence) with real-time collaboration
    → Incident ticket (ServiceNow / Jira) with detailed notes
    → Bridge call recording (saved for post-incident review)

LAWYER / LEGAL COUNSEL (P1/P2 or potential data breach):

  Responsibilities:
    → Advises on legal obligations (regulatory notifications, litigation holds)
    → Reviews all external communications before release
    → Coordinates with law enforcement (if applicable)
    → Manages attorney-client privilege for investigation
    → Assesses regulatory risk and potential fines
    → Engages external breach counsel if needed

LAW ENFORCEMENT LIAISON (if applicable):

  When engaged:
    → Active crime in progress (hacking team targeting your organization)
    → Financial fraud (BEC, wire fraud)
    → Threat to public safety
    → Regulatory requirement to report

  Coordination:
    → Legal counsel coordinates all LE interaction
    → LE does not disrupt containment and investigation
    → Evidence sharing under legal guidance (preservation of privilege)
```

## Incident Response Playbooks

```
PLAYBOOK: RANSOMWARE INCIDENT
===============================

  Detection Indicators:
    → Files encrypted with unknown extensions (.locked, .encrypted, .crypt)
    → Ransom note on desktop or in every directory
    → Rapid file modification events (SIEM alert: >1,000 files modified in 5 minutes)
    → EDR alert: known ransomware process detected (e.g., LockerGoga, Ryuk, BlackCat)
    → Backup files being deleted
    → Unusual SMB activity (lateral file access across shares)
    → Users reporting inaccessible files

  Immediate Containment (First 30 Minutes):
    1. Isolate infected systems from network (physically unplug or disable NIC via EDR)
    2. Disable compromised user accounts (especially if domain admin was compromised)
    3. Block ransomware C2 IPs/domains at firewall and DNS level
    4. Disable shared drives being targeted (take offline if critical)
    5. Preserve memory dump of infected system BEFORE isolation (forensics)
    6. Identify scope: which systems, which users, which data stores affected

  Investigation (First 2-4 Hours):
    1. Determine initial entry vector (phishing email, RDP compromise, vulnerability)
    2. Map lateral movement path (from initial foothold to encrypted systems)
    3. Identify all compromised credentials
    4. Determine if data was exfiltrated (ransom gangs often threaten to publish data)
    5. Check backup integrity (are backups clean or also encrypted?)
    6. Identify critical systems affected (production, databases, file servers)

  Eradication (Hours 4-24):
    1. Reset ALL compromised credentials (prioritize admin accounts)
    2. Patch exploited vulnerability (if known)
    3. Remove persistence mechanisms (scheduled tasks, services, registry keys)
    4. Scan all systems for ransomware artifacts
    5. Engage IR firm for advanced forensics (if needed)

  Recovery (Hours 24-72+):
    1. Restore from clean backups (verify backup integrity first)
    2. Prioritize critical systems: domain controllers → email → file servers → workstations
    3. Monitor restored systems for re-infection (24-48 hour watch period)
    4. Gradually reconnect systems to network (staged, not all at once)
    5. Verify business functionality (test critical workflows)
    6. Consider decryption tools (No More Ransom project; free for some ransomware families)

  Decision Points:
    → Pay ransom? Generally NO (FBI, CISA, DoD advise against; no guarantee of decryption; funds criminal activity)
    → Exceptions: Life-critical systems (healthcare), no backups, extensive assessment of alternatives
    → If paying: only through negotiated IR firm; never directly; consider legal implications

  Post-Incident:
    → Root cause analysis: How did ransomware get in? What controls failed?
    → Backup improvement: Immutable backups, air-gapped copies, regular restore testing
    → Network segmentation: Prevent lateral movement (zero trust, micro-segmentation)
    → User training: Anti-phishing, report suspicious activity
    → EDR enhancement: Improved detection rules, behavioral monitoring
    → Board report: Impact assessment, lessons learned, investment recommendations

PLAYBOOK: DATA BREACH
======================

  Detection Indicators:
    → DLP alert: sensitive data exfiltration attempt (PII, PHI, credit cards)
    → Unusual data access patterns (employee accessing records outside normal scope)
    → Large data export/download detected
    → External report (customer, security researcher, regulatory body)
    → Third-party notification (processor breach affecting your data)
    → SIEM alert: anomalous cloud storage access (S3 bucket, Azure Blob)

  Immediate Actions (First 1 Hour):
    1. Preserve evidence (do NOT delete logs or modify systems)
    2. Contain: block suspected exfiltration path (revoke access, block IP, disable account)
    3. Engage legal counsel immediately (attorney-client privilege)
    4. Activate IRT bridge call
    5. Begin scoping investigation (what data, how many records, when)

  Investigation (First 24-48 Hours):
    1. Identify categories of data exposed (GDPR special categories trigger higher risk)
    2. Count affected individuals (exact or approximate)
    3. Determine breach timeline (when did unauthorized access start/end)
    4. Assess likelihood of harm to affected individuals
    5. Check if data was encrypted (encrypted data may reduce notification obligations)
    6. Engage forensic firm (if internal capability insufficient)

  Notification Timeline:
    → Internal: leadership notified within 1 hour; all employees within 24 hours (need-to-know)
    → Regulatory: GDPR supervisory authority within 72 hours; state AGs per applicable law
    → Individuals: "Without undue delay" if high risk (GDPR); per state breach notification law
    → Credit monitoring: Offer 12-24 months of identity protection if SSN/financial data exposed
    → Business partners: Notify if shared data was affected

  Documentation:
    → Breach assessment report (for regulatory submission)
    → Individual notification letters (reviewed by legal)
    → Evidence preservation log
    → Remediation plan and timeline
    → Insurance claim documentation (cyber insurance)
```

## Post-Incident Review

```
POST-INCIDENT REVIEW (PIR) PROCESS
=====================================

TIMING:

  → PIR initiated: Within 5 business days of incident closure
  → PIR meeting: Within 10 business days of incident closure
  → PIR report published: Within 15 business days of incident closure
  → Remediation items tracked: Until completion (typical 30-90 days)

PIR AGENDA:

  1. Incident Summary (15 minutes):
     → What happened? (narrative of events)
     → Timeline of detection, containment, eradication, recovery
     → Impact assessment (systems affected, data exposed, business impact)

  2. Detection Analysis (20 minutes):
     → How was the incident detected? (automated, user report, external)
     → Detection time: When did the threat enter vs. when was it detected? (dwell time)
     → Could it have been detected earlier? (gap analysis)
     → Which detection rules worked? Which failed?

  3. Response Analysis (20 minutes):
     → Was the correct severity assigned? (too high = wasted resources; too low = insufficient response)
     → Was the right playbook followed? (gaps, deviations)
     → Were roles and responsibilities clear? (confusion, overlap)
     → Was communication effective? (timely, accurate, appropriate audience)
     → What tools worked? What tools were missing?

  4. Root Cause Analysis (30 minutes):
     → 5-Whys methodology:
        Why did the incident occur? → Phishing email clicked by employee
        Why was the phishing email clicked? → Employee not trained on latest phishing tactics
        Why was training insufficient? → Training was generic; not specific to BEC/phishing trends
        Why not specific? → Training content not updated quarterly
        Root cause: Outdated security awareness training program
     → Fishbone diagram (optional for complex incidents):
        Categories: People, Process, Technology, Environment, Management

  5. Lessons Learned (20 minutes):
     → What worked well? (strengths to maintain)
     → What could be improved? (specific, actionable items)
     → What would you do differently? (hindsight analysis)

  6. Remediation Plan (20 minutes):
     → Action items: Specific, assigned, with due date
     → Priority: Based on risk reduction impact
     → Resources: Budget, headcount, tools needed
     → Tracking: Jira/ServiceNow tickets linked to PIR report

PIR REPORT TEMPLATE:

  Document Classification: Internal — Confidential
  Incident Reference: IR-[YYYY]-[NNN]
  Date: [PIR Report Date]
  Incident Period: [Start Date] to [End Date]
  Severity: [P1/P2/P3/P4]

  Executive Summary: 2-3 paragraph overview of incident, impact, and outcome
  Timeline: Detailed chronological log of events (detection through recovery)
  Impact Assessment: Systems, data, business impact, financial impact
  Root Cause: Primary and contributing causes
  Detection and Response Evaluation: What worked, what didn't
  Lessons Learned: Strengths and improvement areas
  Remediation Action Items: [ ] Action, Owner, Due Date, Status
  Metrics: Detection time, response time, containment time, recovery time, total downtime
  Appendices: Evidence screenshots, forensic reports, communication logs

PIR METRICS TO TRACK:

  → Mean Time to Detect (MTTD): Target < 1 hour for P1/P2
  → Mean Time to Respond (MTTR): Target < 4 hours for P1/P2
  → Mean Time to Contain (MTTC): Target < 8 hours for P1/P2
  → Mean Time to Recover (MTTR): Target < 24 hours for P1/P2
  → Dwell Time: Time between threat entry and detection
  → Incident Cost: Downtime cost + response cost + remediation cost + reputational cost
  → Playbook Effectiveness: % of incidents where playbook was followed correctly
  → Training Gaps: Incidents attributable to insufficient training
```

## Integration Points

- **TheHive / Cortex**: Open-source incident response platform; case management; observable analysis; automated response via Cortex analyzers and responders
- **Splunk Phantom / SOAR**: Security orchestration, automation, and response; playbooks for automated containment; integration with 200+ security tools
- **Microsoft Sentinel**: Cloud-native SIEM with SOAR capabilities; automated response playbooks; incident management; integration with Microsoft 365 Defender
- **ServiceNow SecOps**: Incident management integrated with ITSM; vulnerability response; threat intelligence; ORchestrator for automation
- **PagerDuty / Opsgenie**: On-call management; incident alerting; escalation policies; incident collaboration; integrations with monitoring and IR tools
- **Jira Service Management**: Incident ticketing; SLA tracking; post-incident review templates; integration with Confluence for documentation
- **Velociraptor**: Endpoint forensic and live response; memory analysis; artifact collection; threat hunting; open-source
- **GRR (Google Rapid Response)**: Open-source live response framework; remote file collection; memory acquisition; process listing; cross-platform

## Edge Cases

- **Supply chain attack (SolarWinds-style)**: Compromise via trusted third-party software; detection extremely difficult; requires behavioral analysis not signature-based; response involves all customers of compromised vendor; coordinate with vendor's IR team
- **Cloud provider outage vs. cloud account compromise**: Distinguish between AWS/Azure regional outage (monitor status page) and your account being compromised (unauthorized API calls); different response for each
- **Insider threat (malicious employee)**: Legal and HR coordination critical before containment; attorney-client privilege; avoid tipping off suspect prematurely; preserve all digital evidence; HR may need parallel investigation
- **Ransomware with data exfiltration threat**: Modern ransomware gangs (LockBit, BlackCat) often steal data before encrypting; threat to publish data adds negotiation complexity; engage IR firm experienced in ransomware negotiation; do NOT pay ransom as first option
- **Multi-day incident spanning weekends/holidays**: On-call coverage must be sufficient; burnout management for IR team (rotate every 12 hours for sustained P1); pre-arranged external IR firm retainer for extended incidents
- **Regulated industry (healthcare, finance, government)**: Specific notification requirements (HIPAA 60-day notification, GLBA, FFIEC); regulatory examination may follow; enhanced documentation requirements; law enforcement coordination may be mandatory
- **International incidents**: Multiple regulatory jurisdictions (GDPR for EU data, CCPA for California data, etc.); different notification timeframes; language requirements for customer notifications; time zone challenges for IR team coordination
