Support AI Skill

Proactive Service Notifications

Proactively notify customers about outages, maintenance, and known issues before they report them. Manage incident communications, status pages, and post-mortem sharing to maintain trust and reduce inbound support volume. Use when setting up outage alerts, managing maintenance windows, publishing status updates, conducting post-incident reviews, or reducing support ticket spikes during incidents. Triggers on phrases like "service notification", "outage alert", "maintenance window", "incident communication", "status page", "system status", "proactive alert", "post-mortem", "incident update", "service degradation".

Proactive Service Notifications

Notify customers about outages, maintenance, and known issues before they report them.

Workflow

Incident Communication Process

Trigger: Service degradation detected; scheduled maintenance; customer-impacting incident:

Incident detection: Engineering or monitoring system detects issue; classify severity (P1–P4); identify affected services and customer segments.
Initial assessment: Within 15 minutes — root cause hypothesis; estimated resolution time; impact scope (% of customers affected).
Notification drafting: Use approved template; include: what's affected, impact description, ETA, what customer should do; maintain transparent but reassuring tone.
Multi-channel broadcast:

P1 (Critical): Email + in-app banner + status page + SMS (enterprise customers only) + social media
P2 (High): Email + in-app banner + status page + social media
P3 (Medium): In-app banner + status page
P4 (Low): Status page only

Update cadence: P1 — every 30 minutes; P2 — every 60 minutes; P3 — every 2 hours; all updates include progress, revised ETA, next update time.
Resolution notification: "All clear" message — what was affected, root cause (brief), what was fixed, prevention steps, any compensation (if SLA breached).
Post-incident review: Within 48 hours — detailed post-mortem (timeline, root cause, impact, action items); share publicly for P1/P2; internal for P3/P4.
Support coordination: Alert support team of incident; prepare canned responses; flag incoming related tickets as "known issue"; auto-close tickets when resolved.

Notification Templates

INCIDENT NOTIFICATION TEMPLATES
=================================

Template 1: Initial Alert (P1 Critical)
Subject: [Service Alert] We're experiencing issues with [Service Name]

Body:
Hi [Customer Name],

We're currently experiencing an issue with [Service Name] that is affecting [describe impact — e.g., "the ability to process payments"].

What we know:
  - Started: [Time, timezone]
  - Affected: [X]% of users / [Specific region/product]
  - Status: Investigating

What we're doing:
  Our engineering team is actively investigating and working on a resolution.

Next update: Within 30 minutes at [Time + 30 min]

You can track progress on our status page: [link]

We apologize for the inconvenience and appreciate your patience.

— [Company] Support Team

Template 2: Progress Update
Subject: [Update] [Service Name] — Investigation in Progress

Body:
Hi [Customer Name],

Here's an update on the [Service Name] issue:

What changed:
  - We've identified the root cause: [brief explanation]
  - Impact: [updated scope]
  - ETA for resolution: [Time] or "Still investigating"

What you can do:
  [Workaround if available, or "No action needed — we're working on it"]

Next update: [Time]

Status page: [link]

— [Company] Support Team

Template 3: Resolution Notification
Subject: [Resolved] [Service Name] is back to normal

Body:
Hi [Customer Name],

The issue with [Service Name] has been resolved. Service is back to normal as of [Time].

What happened:
  [Brief, non-technical explanation of root cause]

What we fixed:
  [Brief description of fix]

What we're doing to prevent this:
  [1–2 action items from post-mortem]

[If SLA breached]:
  Your SLA credit of [amount] will be applied to your account within [X] business days.

We appreciate your patience and understanding. If you experience any ongoing issues, please reply to this email or contact support.

— [Company] Support Team

Template 4: Scheduled Maintenance
Subject: [Notice] Scheduled maintenance on [Date] — [Service Name]

Body:
Hi [Customer Name],

We'll be performing scheduled maintenance on [Service Name] on:
  Date: [Date]
  Time: [Start time] – [End time] (timezone)

Expected impact:
  [Service will be unavailable / Degraded performance / No impact]

What you should do:
  [Save work before start time / No action needed / Alternative process]

If this maintenance window doesn't work for you, please contact us by [deadline] to discuss options.

Details: [link to maintenance page]

— [Company] Support Team

Incident Severity Classification

INCIDENT SEVERITY MATRIX
==========================

P1 — Critical
  Criteria: Complete service outage; data loss risk; security breach; >25% of customers affected
  Response: War room assembled within 15 minutes; CEO/CTO notified; hourly executive updates
  Communication: Email + in-app + status page + SMS (enterprise) + social media
  Update frequency: Every 30 minutes
  Target resolution: 2 hours
  Post-mortem: Public; within 48 hours

P2 — High
  Criteria: Major feature unavailable; degraded performance; 10–25% of customers affected
  Response: Engineering on-call engaged within 30 minutes; VP notified
  Communication: Email + in-app + status page + social media
  Update frequency: Every 60 minutes
  Target resolution: 4 hours
  Post-mortem: Public; within 72 hours

P3 — Medium
  Criteria: Minor feature issue; limited customer impact; <10% affected
  Response: Engineering team triaged within 2 hours
  Communication: In-app banner + status page
  Update frequency: Every 2 hours
  Target resolution: 8 hours
  Post-mortem: Internal; within 1 week

P4 — Low
  Criteria: Cosmetic issue; very limited impact; workarounds available
  Response: Normal triage; bug logged
  Communication: Status page only
  Update frequency: None (resolved in next deployment)
  Target resolution: Next release cycle
  Post-mortem: None required

Edge Cases

Extended outage (no ETA after 4+ hours):
Communication: Shift from "we're working on it" to "here's what we know and don't know"; be transparent about uncertainty
Escalation: Executive involvement (CEO/CTO sends personal note to enterprise customers)
Compensation: Pre-approve SLA credits; consider proactive credits (don't wait for customer to ask)
Alternatives: Provide workaround or alternative service; temporary access to competitor service (rare but seen in extreme cases)
Cadence: Increase update frequency to every 15 minutes to show active engagement

Cascading incidents (one incident triggers multiple system failures):
Communication: Single incident thread with sub-issues; avoid multiple separate emails
Priority: Address customer-facing impact first (even if root cause is in backend system A)
Coordination: Single incident commander; unified communication channel
Example: Database migration fails → API down → payments fail → reporting dashboard errors
Customer message: "We're experiencing issues with our payment processing system" (not technical details)

Maintenance window overrun (maintenance takes longer than expected):
Communication: Alert 30 minutes before planned end time if overrun expected
Transparency: Explain why overrun occurred; provide revised ETA
Compensation: Consider credit if significant overrun (> 2 hours) even for P3
Prevention: Add 50% buffer to maintenance window estimates; schedule during lowest-traffic hours

Regional outage (only certain geographies affected):
Targeting: Use customer data to segment notifications (only notify affected regions)
Accuracy: Verify affected regions before broadcasting; risk of notifying unaffected customers creates unnecessary concern
Time zone: Send notifications in local time when possible; status page always available
Example: "This issue affects users in the EU region"

False positive detection (monitoring alerts but no actual customer impact):
Verification: Confirm customer impact before sending notifications (avoid "boy who cried wolf")
Threshold: Send notifications only when confirmed customer-facing impact exists
Internal vs. external: Use internal alerts for potential issues; external notifications only for confirmed impact
Recovery: If notification sent but impact was minimal, send brief "false alarm" message and apologize

Integration Points

Monitoring tools: Datadog, PagerDuty, New Relic, Sentry — incident detection, alerting
Status page platforms: Atlassian Statuspage, Better Uptime, Statuspal — public status page, subscriber notifications
Email platforms: SendGrid, Mailgun, Amazon SES — notification delivery
In-app messaging: Intercom, Customer.io, Iterable — in-app banners, contextual notifications
Social media: Twitter/X, LinkedIn — public incident updates
Help desk: Zendesk, Freshdesk — ticket flagging, canned responses, auto-close related tickets
CRM: Salesforce, HubSpot — customer segmentation, enterprise customer identification
Collaboration: Slack, Teams — internal incident war room, engineering coordination
Incident management: Jira Service Management, ServiceNow ITSM — incident tracking, post-mortem
Data warehouse: Snowflake, BigQuery — incident analytics, MTTR tracking, trend analysis

Disclaimer: All rights reserved by Circulos AI. These skills are specifically designed for Claude Code, Claude Cowork, Codex, and OpenClaw. When using or referencing any skill, please provide proper attribution to Circulos AI.