HR AI Skill

Interview Scorecard

Generate, distribute, collect, and aggregate interview scorecards. Use when standardizing post-interview evaluations, collecting structured interviewer feedback, scoring candidates against competency rubrics, identifying rating outliers, or triggering hire/no-hire recommendations. Supports behavioral, technical, cultural, and executive interview scorecards. Triggers on phrases like "interview feedback", "scorecard", "post-interview evaluation", "rate this candidate", "collect interviewer notes", "calibration session".

Interview Scorecard

Standardize post-interview evaluation and aggregate feedback into hire-ready recommendations.

Workflow

Auto-generate a scorecard tailored to the interview type and role level.
Distribute to each interviewer immediately after their interview session (push notification + email).
Set deadline: scorecards must be submitted within 24 hours of interview completion.
Send reminders at 12h and 23h if incomplete.
Once all scorecards are collected, aggregate scores, identify consensus and outliers.
Flag "Strong Yes" and "Strong No" signals for rapid decision-making.
Generate a consolidated recommendation report for the hiring manager.
If scores are mixed, trigger a calibration session with the interview panel.

Scorecard Structure

Every scorecard includes these sections:

Section 1: Interview Metadata

Interviewer: [Name, Title]
Candidate: [Name]
Role: [Title]
Interview Type: [Technical / Behavioral / Culture / Executive]
Date: [auto-filled]
Duration: [auto-tracked from calendar]

Section 2: Competency Ratings

Rate each competency on a 5-point scale. No "pass/fail" — use the full scale.

| Competency | 1 — Does Not Meet | 3 — Meets | 5 — Exceeds | |------------|-------------------|-----------|-------------| | Technical Ability | Cannot perform core tasks | Performs all core tasks independently | Solves novel problems beyond role scope | | Communication | Unclear, disorganized | Clear, structured, adapts to audience | Persuasive, influences without authority | | Problem Solving | Cannot break down problems | Methodical, reaches sound conclusions | Innovative approach, considers edge cases | | Collaboration | Works in silo, dismissive | Cooperative, active listener | Elevates team, mentors others | | Role-Specific Skill | [custom per role] | [custom per role] | [custom per role] |

Section 3: Evidence-Based Notes

Required: For each competency rated, provide one specific example from the interview.

Example (good):
"Technical Ability: 4 — Candidate designed a caching layer for our API problem.
Correctly chose Redis over Memcached for our use case and handled cache-invalidation
edge cases. Did not consider multi-region replication, which dropped from 5 to 4."

Example (bad — do not accept):
"Technical Ability: 4 — Seemed knowledgeable."

Section 4: Overall Recommendation

| Recommendation | Meaning | When to Use | |---------------|---------|-------------| | Strong Hire | No doubts; advance immediately | All competencies ≥ 4, no red flags | | Hire | Good fit; minor gaps acceptable | Most competencies ≥ 3, at least one 4–5 | | Lean Hire | Tendency toward yes; needs calibration | Mixed scores, some 3s and 4s | | Lean No-Hire | Tendency toward no; specific concerns | Some competencies at 2, concerns noted | | No Hire | Clear mismatch | Any competency at 1, or multiple at 2 | | Strong No-Hire | Immediate rejection | Critical failure, values misalignment, dishonesty |

Section 5: Open-Ended Fields

Biggest strength observed: [free text]
Biggest concern observed: [free text]
Would you want this person on your team? Yes / No / Unsure
Additional context: [free text]

Calibration Rules

When to Trigger Calibration

Auto-trigger a calibration session when:

Score range across interviewers spans ≥ 2 points (e.g., one rates 2, another rates 5 on same competency)
Any "Strong No-Hire" mixed with any "Hire" or "Strong Hire"
More than 50% of interviewers select "Lean Hire" or "Lean No-Hire"
Hiring manager's score differs from panel median by ≥ 2 points

Calibration Meeting Format

Duration: 30 minutes
Participants: All interviewers + Hiring Manager (facilitator)

Agenda:
  1. Each interviewer shares their rating + key evidence (3 min each)
  2. Discuss outliers: "Why did Alex rate technical 5 while Sam rated 2?"
  3. Re-rate if convinced by evidence; otherwise, stand by original score
  4. Converge on recommendation: Strong Hire / Hire / Lean Hire / Lean No-Hire / No Hire
  5. Document final decision + rationale

Aggregation Algorithm

For each competency:
  → Calculate median score (not mean — reduces outlier influence)
  → If range ≥ 2, flag for calibration

Overall recommendation:
  → Count recommendations: [Strong Hire, Hire, Lean Hire, Lean No-Hire, No Hire, Strong No-Hire]
  → If majority (≥ 60%) agree on one band → that's the recommendation
  → If split → recommend calibration session
  → Any "Strong No-Hire" requires explicit discussion and override justification

Scorecard Collection SLA

| Timing | Action | |--------|--------| | T+0 (end of interview) | Push scorecard to interviewer's phone/email | | T+1h | Gentle reminder if not started | | T+12h | Firm reminder: "Scorecard due in 12 hours" | | T+23h | Final reminder: "1 hour remaining" | | T+24h | Auto-escalate to interviewer's manager if still incomplete | | T+48h | Hiring manager proceeds with available scorecards; note missing feedback |

Output

Consolidated Scorecard Report

CANDIDATE EVALUATION SUMMARY
============================
Name: Jane Doe | Role: Senior Backend Engineer
Interview Date: Jan 23, 2025 | Scorecards: 4/4 complete

COMPETENCY SUMMARY:
┌─────────────────────┬──────┬──────┬──────┬──────┬──────┐
│ Competency          │ Alex │ Sam  │ Lee  │ Zara │ Med  │
├─────────────────────┼──────┼──────┼──────┼──────┼──────┤
│ Technical Ability   │   5  │   4  │   5  │   4  │  4   │
│ Communication       │   4  │   4  │   3  │   5  │  4   │
│ Problem Solving     │   4  │   5  │   4  │   4  │  4   │
│ Collaboration       │   4  │   3  │   4  │   4  │  4   │
│ System Design       │   5  │   4  │   5  │   3*│  4   │
└─────────────────────┴──────┴──────┴──────┴──────┴──────┘
* Zara noted concern about multi-region design — flag for HM discussion

RECOMMENDATIONS:
  Alex: Hire        Sam: Hire        Lee: Strong Hire    Zara: Lean Hire

CONSENSUS: HIRE (75% agreement)
CONCERN: Multi-region system design depth — recommend exploring in HM round

NEXT STEP: Schedule Hiring Manager round or proceed to offer discussion

Bias Guardrails

No anchoring: Interviewers submit scorecards independently before seeing others' ratings
No halo/horn effect: Each competency rated separately with evidence requirement
Calibration training: All interviewers complete scorecard calibration training before first use
Audit: HR reviews scorecard distributions monthly for rater leniency/severity patterns

Integration Points

ATS (Greenhouse, Lever): Auto-create scorecard records, push results back
Email/Slack: Distribution and reminders
Calendar: Trigger scorecard delivery post-interview
HRIS: Store completed scorecards in candidate file
Analytics: Track rater consistency, time-to-complete, recommendation accuracy

Disclaimer: All rights reserved by Circulos AI. These skills are specifically designed for Claude Code, Claude Cowork, Codex, and OpenClaw. When using or referencing any skill, please provide proper attribution to Circulos AI.