---
name: itsm-service-desk
description: Manage IT service management and service desk operations including incident management, problem management, change management, request fulfillment, knowledge management, asset management, and ITIL compliance. Use when managing IT service requests, handling incidents, processing changes, maintaining knowledge base, or tracking SLA compliance. Triggers on phrases like "ITSM", "ITIL", "service desk", "incident management", "problem management", "change management", "request fulfillment", "knowledge base", "SLA management", "ticket management", "root cause analysis", "change advisory board", "CAB", "service catalog", "asset management", "IT asset", "configuration item", "CI", "CMDB".
---

# IT Service Management (ITSM)

Deliver and manage IT services using ITIL-aligned processes, automated workflows, and measurable SLAs.

## Incident Management

### Incident Lifecycle & Operations

```
INCIDENT MANAGEMENT FRAMEWORK:
═══════════════════════════════

ITSM PLATFORM: ServiceNow (ITSM module) + Freshservice (backup)
  Service desk team: 8 analysts (L1) + 4 specialists (L2) + 2 engineers (L3)
  Operating hours: 24/7 (L1: 8 AM - 8 PM, L2: 24/7 on-call, L3: on-call)
  Support channels: Email, web portal, phone, chat (Teams integration)
  Average tickets/day: 85 (L1) + 15 (L2) + 3 (L3)

INCIDENT CLASSIFICATION:
  ┌────────────────────────┬──────────┬──────────┬──────────────────┐
  │ Category              │ Count    │ Avg Res  │ SLA Target       │
  │                       │ (Jan)    │ Time     │                  │
  ├────────────────────────┼──────────┼──────────┼──────────────────┤
  │ Hardware issues       │ 120      │ 2.5 hrs  │ 4 hours          │
  │ Software issues       │ 185      │ 1.8 hrs  │ 4 hours          │
  │ Network/connectivity  │ 45       │ 3.2 hrs  │ 2 hours          │
  │ Access/authentication │ 95       │ 0.8 hrs  │ 1 hour           │
  │ Application errors    │ 78       │ 2.1 hrs  │ 4 hours          │
  │ Email/calendar        │ 62       │ 1.2 hrs  │ 2 hours          │
  │ Printer/peripheral    │ 38       │ 1.5 hrs  │ 8 hours          │
  │ Account/identity      │ 55       │ 0.6 hrs  │ 1 hour           │
  │ Cloud service issues  │ 28       │ 2.8 hrs  │ 4 hours          │
  │ Security incidents    │ 8        │ 1.5 hrs  │ Immediate        │
  │ Other/unknown         │ 15       │ 2.0 hrs  │ 8 hours          │
  │ ───────────────────── │ ────── │ ────── │ ─────────────── │
  │ TOTAL               │ 729    │ 1.9 hrs │ Variable         │
  └────────────────────────┴──────────┴──────────┴──────────────────┘

INCIDENT SEVERITY DEFINITIONS:
  P1 — Critical:
    - Entire service down (all users affected)
    - Revenue-impacting outage
    - Data loss or corruption
    - Security breach (active)
    Response: Immediate (<15 min), Update: Every 30 min, Resolution: 4 hours
  
  P2 — High:
    - Major service degradation (many users affected)
    - Critical function unavailable (workaround exists)
    - Department-level impact
    Response: <30 min, Update: Every 1 hour, Resolution: 8 hours
  
  P3 — Medium:
    - Single user or small group affected
    - Non-critical function unavailable
    - Workaround available
    Response: <2 hours, Update: Every 4 hours, Resolution: 24 hours
  
  P4 — Low:
    - Cosmetic issues
    - Questions/inquiries
    - Minor inconveniences
    Response: <8 hours, Update: As needed, Resolution: 3 business days

INCIDENT STATISTICS (January 2025):
  Total incidents: 729
  ┌─────────────────────────┬──────────┬──────────┬──────────┐
  │ Severity                │ Count    │ SLA Met  │ SLA Miss │
  ├─────────────────────────┼──────────┼──────────┼──────────┤
  │ P1 — Critical           │ 2        │ 2 (100%) │ 0        │
  │ P2 — High               │ 28       │ 27 (96%) │ 1        │
  │ P3 — Medium             │ 185      │ 178 (96%)│ 7        │
  │ P4 — Low                │ 514      │ 498 (97%)│ 16       │
  │ ───────────────────── │ ────── │ ────── │ ────── │
  │ TOTAL                 │ 729    │ 705 (97%)│ 24 (3%)  │
  └─────────────────────────┴──────────┴──────────┴──────────┘

  First Contact Resolution (FCR): 68% (target: >65%) ✓
  Escalation rate: 18% (L1 → L2/L3)
  Mean Time to Resolve (MTTR): 1.9 hours (all incidents)
  Mean Time to Acknowledge (MTTA): 18 minutes (all incidents)
  Customer satisfaction (CSAT): 4.3/5.0

ESCALATION MATRIX:
  L1 → L2 (technical specialist):
    Criteria: L1 cannot resolve within 30 minutes
    Process: Ticket auto-rerouted + chat handoff + context transfer
    Avg. time: 25 minutes (L2 acknowledgment)
  
  L2 → L3 (engineering):
    Criteria: L2 cannot resolve within 2 hours; code/config change needed
    Process: Ticket created (engineering queue) + detailed notes
    Avg. time: 1 hour (L3 acknowledgment)
  
  L3 → Vendor:
    Criteria: Vendor-specific issue (SaaS, hardware, cloud)
    Process: Vendor ticket + internal ticket linkage
    Avg. time: 4 hours (vendor acknowledgment)
  
  L3 → Management:
    Criteria: Business impact, budget needed, policy exception
    Process: Verbal escalation + email summary
    Avg. time: 2 hours (management response)

KNOWLEDGE-BASED RESOLUTION:
  Knowledge base (KB): 850+ articles (ServiceNow KB)
  KB utilization: 45% of tickets (KB article suggested → resolved)
  Self-service resolution: 28% of tickets (user resolves via KB)
  KB articles/month: 12-15 (new or updated)
  KB accuracy: 92% (reviewed quarterly)
  
  Auto-suggestion:
    AI-powered: ServiceNow Virtual Agent (chatbot)
    Accuracy: 78% (correct KB article suggested)
    Coverage: 65% of ticket categories
    Continuous learning: Feedback loop (correct/incorrect)
```

## Change Management

### Controlled Change Process

```
CHANGE MANAGEMENT FRAMEWORK:
════════════════════════════

CHANGE TYPES:
  ┌────────────────────────┬──────────┬──────────┬──────────────────┐
  │ Type                  │ Count    │ Approval │ Lead Time        │
  │                       │ (Jan)    │ Required │                  │
  ├────────────────────────┼──────────┼──────────┼──────────────────┤
  │ Standard (pre-approved)│ 125      │ None     │ Immediate        │
  │   (patch deployment,   │          │          │                  │
  │   user provisioning,  │          │          │                  │
  │   config update)       │          │          │                  │
  │ Normal (reviewed)      │ 42       │ Manager  │ 24-48 hours      │
  │   (new software,       │          │ + change │                  │
  │   infrastructure       │          │ owner    │                  │
  │   change)              │          │          │                  │
  │ Emergency              │ 5        │ Emergency│ Immediate        │
  │   (critical fix,       │          │ CAB chair│ (post-review)    │
  │   security patch)      │          │          │                  │
  │ ───────────────────── │ ────── │ ────── │ ─────────────── │
  │ TOTAL               │ 172    │ Variable │ Variable         │
  └────────────────────────┴──────────┴──────────┴──────────────────┘

CHANGE PROCESS (Normal):
  1. Request: Change request (CR) submitted (ServiceNow form)
     - Description, scope, impact, risk level
     - Implementation plan, rollback plan
     - Testing evidence, back-out criteria
  
  2. Assessment: Change manager evaluates
     - Risk assessment (low, medium, high)
     - Impact analysis (services, users, dependencies)
     - Resource availability (team, time, budget)
     - Schedule check (maintenance window conflict?)
  
  3. Approval:
     Low risk: Change manager approval (1 person)
     Medium risk: Change manager + technical lead (2 people)
     High risk: CAB (Change Advisory Board — 3-5 members)
  
  4. Scheduling:
     Maintenance window: Sunday 2:00 AM - 6:00 AM (primary)
     Emergency window: As scheduled (CAB approval)
     Business hours: Exception (CAB + management approval)
  
  5. Implementation:
     Pre-check: System health, backup verification
     Execute: Documented runbook + live documentation
     Post-check: Health check, user validation, monitoring
  
  6. Review:
     Successful: Close + knowledge base update (if needed)
     Failed: Rollback + post-incident review + RCA
  
  7. CAB (monthly):
     Review: All changes (success/failure metrics)
     Discussion: Process improvement, lessons learned
     Calendar: First Wednesday of each month

CHANGE STATISTICS (January 2025):
  Total changes: 172
  ┌─────────────────────────┬──────────┬──────────┬──────────┐
  │ Outcome                 │ Count    │ %        │          │
  ├─────────────────────────┼──────────┼──────────┼──────────┤
  │ Successful              │ 162      │ 94.2%    │          │
  │ Failed (rollback)       │ 7        │ 4.1%     │          │
  │ Failed (no rollback)    │ 1        │ 0.6%     │ ⚠️       │
  │ Cancelled               │ 2        │ 1.2%     │          │
  │ ───────────────────── │ ────── │ ────── │          │
  │ TOTAL                 │ 172    │ 100%   │          │
  └─────────────────────────┴──────────┴──────────┴──────────┘

  Change success rate: 94.2% (target: >90%) ✓
  Change-related incidents: 3 (1.7% — target: <5%) ✓
  Emergency changes: 5 (2.9% — target: <5%) ✓
  Failed changes without rollback: 1 (investigated)
    Root cause: Runbook gap (documented, fixed)
    Impact: 30-minute service degradation (internal tool only)

STANDARD CHANGES (Pre-approved):
  Inventory: 45 standard change templates
  Examples:
    - Server patch deployment (monthly cycle)
    - User account creation/modification
    - Firewall rule addition (approved vendor)
    - SSL certificate renewal
    - Backup verification (weekly)
    - Monitoring threshold adjustment
    - Application configuration update (low risk)
    - DNS record update (internal)
    - Software license renewal
    - Storage expansion (<10% increase)
  
  Auto-approval: Pre-defined criteria met → auto-approve
  Auto-implementation: 28 of 45 (fully automated via Runbook Automation)
  Monthly volume: ~125 (73% of all changes)

RISK ASSESSMENT MODEL:
  Risk score = Impact × Likelihood × Complexity
  
  Impact:
    Low (1): Single user, non-critical system
    Medium (2): Department, critical system with workaround
    High (3): Enterprise-wide, revenue-impacting
  
  Likelihood:
    Low (1): Well-tested, proven change
    Medium (2): New but tested in staging
    High (3): Unproven, complex, tight timeline
  
  Complexity:
    Low (1): Single system, well-documented
    Medium (2): Multiple systems, dependencies
    High (3): Enterprise-wide, cross-team
  
  Risk score mapping:
    1-4: Low risk (auto-approve or single approval)
    5-9: Medium risk (dual approval)
    10-27: High risk (CAB approval)
```

## Problem Management

### Root Cause Analysis & Prevention

```
PROBLEM MANAGEMENT FRAMEWORK:
══════════════════════════════

PROBLEM vs. INCIDENT:
  Incident: Restore service (quick fix)
  Problem: Find and eliminate root cause (permanent fix)

PROBLEM LIFECYCLE:
  1. Identification:
     - Reactive: From recurring incidents (3+ same root cause)
     - Proactive: Trend analysis, monitoring anomaly, error pattern
     - Preventive: Vendor advisory, industry alert, audit finding
  
  2. Logging:
     - Problem record (ServiceNow)
     - Linked incidents (all related)
     - Priority assignment (based on incident severity + frequency)
  
  3. Investigation:
     - Root cause analysis (5-Why, Fishbone, Pareto)
     - Data collection (logs, metrics, user reports)
     - Reproduction (lab environment, if needed)
     - Timeline creation (event sequence)
  
  4. Resolution:
     - Workaround (immediate, if possible)
     - Permanent fix (code change, config update, process change)
     - Verification (test, monitor, validate)
  
  5. Closure:
     - Known Error record (KB update)
     - Change request (if fix requires change)
     - Metrics update (problem stats)
     - Lessons learned (team sharing)

PROBLEM STATISTICS (January 2025):
  Total problems logged: 12
  ┌─────────────────────────┬──────────┬──────────┐
  │ Type                    │ Count    │ Status   │
  ├─────────────────────────┼──────────┼──────────┤
  │ Reactive                │ 8        │ 6 closed │
  │ Proactive               │ 3        │ 2 closed │
  │ Preventive              │ 1        │ 1 closed │
  │ ───────────────────── │ ────── │ ────── │
  │ TOTAL                 │ 12     │ 9 closed │
  └─────────────────────────┴──────────┴──────────┘

  Open problems: 3 (2 investigating, 1 awaiting vendor fix)
  Mean time to resolve: 5.2 days (target: <7 days) ✓
  Known errors created: 8 (KB updated)
  Recurring incidents reduced: 35% (vs. previous quarter)

ROOT CAUSE ANALYSIS (5-Why Example):
  Problem: API latency spike (daily, 2 PM - 3 PM)
  
  Why 1: Why latency spike? → Database slow queries
  Why 2: Why slow queries? → Missing index on large table
  Why 3: Why missing index? → Schema change (new column) added without index
  Why 4: Why no index? → Developer didn't know index was needed
  Why 5: Why didn't know? → No database design review process for schema changes
  
  Root cause: Lack of database design review in CI/CD pipeline
  Permanent fix: Add database migration review step (CI pipeline)
  Workaround: Manual index creation (temporary, same day)
  Verification: Latency normalized, no recurrence (2 weeks monitoring)

TREND ANALYSIS:
  Top recurring issues (January):
    1. API timeout (8 incidents → 1 problem → resolved)
    2. Email delivery failure (5 incidents → 1 problem → resolved)
    3. VPN connectivity (12 incidents → 1 problem → workaround applied)
    4. Printer issues (15 incidents → not a problem — user training needed)
    5. Login failures (6 incidents → 1 problem → MFA policy update)
  
  Trend tracking:
    Monthly: Incident categorization + frequency analysis
    Quarterly: Problem trend report + improvement plan
    Annually: Service improvement plan (SIP) + investment plan
```

## Request Fulfillment

### Service Catalog & User Requests

```
SERVICE CATALOG:
════════════════

REQUEST TYPES (ServiceNow Service Catalog):
  ┌──────────────────────────┬──────────┬──────────┬────────────┐
  │ Request Type             │ Count    │ Auto/Manual│ Avg Time  │
  │                          │ (Jan)    │           │            │
  ├──────────────────────────┼──────────┼───────────┼────────────┤
  │ New hardware/laptop      │ 12       │ 60% auto   │ 3 days     │
  │ Software installation    │ 28       │ 75% auto   │ 4 hours    │
  │ Access request           │ 85       │ 80% auto   │ 2 hours    │
  │ Password reset           │ 45       │ 95% auto   │ 5 min      │
  │ Account unlock           │ 22       │ 90% auto   │ 10 min     │
  │ Cloud resource request   │ 15       │ 50% auto   │ 1 day      │
  │ Meeting room booking     │ 68       │ 100% auto  │ Instant    │
  │ IT training request      │ 8        │ 30% auto   │ 2 days     │
  │ Equipment return         │ 10       │ 40% auto   │ 1 day      │
  │ Vendor access request    │ 5        │ 20% auto   │ 3 days     │
  │ Other/general inquiry    │ 32       │ 25% auto   │ 4 hours    │
  │ ────────────────────── │ ────── │ ─────── │ ─────── │
  │ TOTAL                  │ 330    │ 63% auto │ 4.2 hrs    │
  └──────────────────────────┴──────────┴───────────┴────────────┘

  Service catalog items: 45 (standardized)
  Self-service rate: 63% (automated fulfillment)
  Manual fulfillment: 37% (complex requests, approvals)
  Request satisfaction: 4.5/5.0
  Average fulfillment time: 4.2 hours (all requests)

FULFILLMENT WORKFLOW:
  1. User submits request (ServiceNow portal / chatbot)
  2. Auto-validation (eligibility, policy check, approval matrix)
  3. Approval (if required — manager, IT, security)
  4. Fulfillment (automated runbook or manual technician)
  5. Confirmation (email/chat notification)
  6. CSAT survey (post-fulfillment, 48 hours)

SLA PERFORMANCE:
  ┌──────────────────────────┬──────────┬──────────┐
  │ Metric                   │ Target   │ Actual   │
  ├──────────────────────────┼──────────┼──────────┤
  │ Incident resolution      │ 95%      │ 97%      │
  │ Request fulfillment      │ 90%      │ 93%      │
  │ Change success rate      │ 90%      │ 94.2%    │
  │ First contact resolution │ 65%      │ 68%      │
  │ CSAT score               │ 4.0/5.0  │ 4.3/5.0  │
  │ Knowledge base accuracy  │ 90%      │ 92%      │
  │ ────────────────────── │ ────── │ ────── │
  │ Overall                │ Pass   │ Pass   │
  └──────────────────────────┴──────────┴──────────┘

  All SLAs met (January 2025) ✓
  Trend: Improving (consistent over 6 months)
```

## Output

### ITSM Operations Dashboard

```
ITSM OPERATIONS DASHBOARD — Jan 2025
══════════════════════════════════

Service Desk:
  Team: 8 (L1) + 4 (L2) + 2 (L3)
  Tickets/day: 85 (L1) + 15 (L2) + 3 (L3)
  Channels: Email, web, phone, chat (Teams)
  CSAT: 4.3/5.0

Incidents:
  Total: 729
  SLA compliance: 97% (705/729)
  P1 incidents: 2 (100% SLA met)
  MTTR: 1.9 hours
  MTTA: 18 minutes
  FCR: 68% (target: >65%) ✓

Changes:
  Total: 172
  Success rate: 94.2% (target: >90%) ✓
  Emergency: 5 (2.9% — target: <5%) ✓
  Standard: 125 (73% — pre-approved)
  Change-related incidents: 3 (1.7% — target: <5%) ✓

Problems:
  Total logged: 12
  Closed: 9 (75%)
  Open: 3 (investigation/vendor)
  MTTR: 5.2 days (target: <7 days) ✓
  Known errors: 8 (KB updated)

Requests:
  Total: 330
  Auto-fulfillment: 63%
  Avg. fulfillment: 4.2 hours
  Satisfaction: 4.5/5.0
  Self-service (KB): 28%

Knowledge Base:
  Articles: 850+
  Utilization: 45% of tickets
  Accuracy: 92% (reviewed quarterly)
  New/updated: 12-15/month

Actions:
  1. Close 3 open problems (investigation complete)
  2. KB article expansion (low-coverage categories)
  3. Standard change automation (28 → 35 target)
  4. FCR improvement (68% → 72% target)
  5. CAB review (monthly — first Wednesday)
```

## Integration Points

- ITSM platforms (ServiceNow, Freshservice, Jira Service Management): Ticketing, workflows
- CMDB (ServiceNow CMDB): Configuration items, relationships, impact analysis
- Monitoring (Datadog, Prometheus): Alert-to-incident automation
- Automation (ServiceNow Flow, Ansible, Runbook Automation): Self-healing, fulfillment
- Communication (Teams, Slack, email): Notifications, updates, CSAT surveys
- Identity (Okta, Azure AD): User lookup, access management
- Asset management (ServiceNow SAM, Snipe-IT): Hardware, software inventory
- Knowledge base (ServiceNow KB, Confluence): Articles, KB suggestions
- HRIS (Rippling, Workday): Employee lifecycle trigger (onboarding/offboarding IT)
- Vendor management (ServiceNow, Procure): Vendor tickets, SLA tracking
- Reporting (ServiceNow reports, Power BI): Dashboards, analytics
- Change management (ServiceNow CMB): Change advisory, scheduling

## Edge Cases

- **Major incident (enterprise-wide outage)**: War room activation; communication cadence; cross-team coordination; post-incident review
- **Repeated incident (no permanent fix)**: Problem escalation; vendor engagement; workaround optimization; permanent fix timeline
- **Emergency change (no CAB available)**: Emergency CAB chair; post-review; documentation; risk acceptance
- **Change rollback failure**: Blast radius containment; manual recovery; root cause; process improvement
- **SLA breach (impending)**: Proactive communication; escalation; workaround; customer notification
- **Knowledge gap (no KB article)**: Immediate article creation; peer review; publish; prevention
- **Service catalog overload (high volume)**: Auto-fulfillment expansion; queue management; staffing adjustment
- **Vendor dependency (extended resolution)**: Vendor escalation; contract review; workaround; alternative
- **Security incident (ITSM + SOC)**: Dual workflow; information sharing; coordinated response; documentation
- **System migration (ITSM platform)**: Data migration; process validation; training; parallel run