Sales AI Skill
Lookalike Audience Generation
Use AI to find prospects similar to your best customers for targeted prospecting. Use when building lookalike audiences, identifying high-potential prospects based on customer similarities, training ML models for prospect scoring, or creating data-driven ta...
Lookalike Audience Generation
Use AI and machine learning to identify and target prospects that mirror the characteristics of your highest-value customers.
Workflow
- Define and export best customer segment (top 20% by revenue, profit, or LTV).
- Analyze customer characteristics across firmographics, technographics, behaviors, and firm dynamics.
- Train ML model to identify predictive patterns in customer success data.
- Score existing database and external prospect lists for similarity.
- Generate ranked prospect list with similarity scores and predicted fit.
- Trigger tailored outreach campaigns referencing similar customer success stories.
- Track conversion rates by similarity score band and refine model.
Customer Analysis Framework
BEST CUSTOMER PROFILE ANALYSIS
══════════════════════════════════════════════════════════════════════
Segment Definition (Choose one primary model):
Model A — Revenue-Based (Top 20% by ARR):
Criteria: Customers generating top 20% of revenue
Typical profile: Larger companies, longer tenure, more products
Strength: Direct revenue correlation
Weakness: May include high-maintenance accounts with low margin
Model B — Profit-Based (Top 20% by Net Margin):
Criteria: Customers with highest profit margin after support/implementation costs
Typical profile: Self-sufficient users, low support volume, efficient implementations
Strength: Focuses on profitability, not just revenue
Weakness: May miss strategic/high-growth accounts
Model C — LTV-Based (Top 20% by Lifetime Value):
Criteria: Customers with highest projected lifetime value
Typical profile: High renewal rates, expansion revenue, low churn
Strength: Long-term revenue prediction
Weakness: Requires historical data (3+ years for accuracy)
Model D — Product-Fit Based (Top 20% by Engagement):
Criteria: Customers with highest product adoption, usage depth, and feature breadth
Typical profile: Active users, champions within organization, high NPS
Strength: Best product-market fit signal
Weakness: Doesn't directly correlate with revenue
Recommended: Hybrid Model (60% Revenue + 20% Profit + 20% Engagement)
Firmographic Characteristics (Company-Level):
→ Industry/Vertical: [Analyze distribution — top 5 industries]
→ Company Size (Employees): [Distribution — median, quartiles]
→ Annual Revenue: [Distribution — median, quartiles]
→ Company Stage: [Startup, Growth, Scale-up, Enterprise distribution]
→ Geography: [Countries, regions, time zones]
→ Founding Year: [Company age distribution]
→ Ownership: [Public, private, VC-backed, family-owned distribution]
→ Tech Readiness: [Digital maturity score based on tech stack]
Technographic Characteristics (Technology Stack):
→ Current Category Solutions: [What tools do they use in adjacent categories?]
→ CRM Platform: [Salesforce, HubSpot, etc.]
→ Marketing Stack: [Marketo, Pardot, Mailchimp, etc.]
→ Infrastructure: [AWS, Azure, GCP, on-premise]
→ Integrations Used: [Top 10 most common integrations among best customers]
→ Tech Stack Maturity: [Sophisticated, standard, basic]
Behavioral Characteristics (Pre-Purchase):
→ Lead Source: [Inbound, outbound, referral, content, event]
→ Content Consumed: [Top 5 content assets downloaded before purchase]
→ Time to First Contact: [Median days from first touch to first call]
→ Sales Cycle Length: [Median days from first contact to close]
→ Touchpoints Before Close: [Median number of interactions]
→ Decision Process: [Number of stakeholders, approval levels]
Firm Dynamics (Organizational Signals):
→ Hiring Trends: [Growing, stable, shrinking — correlate with success]
→ Funding Events: [Recent funding rounds, acquisition activity]
→ Executive Changes: [Recent C-suite or VP-level hires]
→ Market Position: [Market leader, challenger, follower]
→ Digital Transformation: [Active digital initiatives, modernization projects]
ML Model Training and Scoring
LOOKALIKE SCORING MODEL
══════════════════════════════════════════════════════════════════════
Feature Engineering:
Category 1 — Firmographics (Weight: 30%):
→ Industry match (exact match = 1.0, related = 0.7, unrelated = 0.2)
→ Employee count similarity (log-scale distance from median best-customer size)
→ Revenue similarity (log-scale distance from median best-customer revenue)
→ Stage match (startup/growth/scale/enterprise classification)
→ Geographic match (same region/country)
Category 2 — Technographics (Weight: 25%):
→ Tech stack overlap (Jaccard similarity between prospect and best-customer stacks)
→ Category solution usage (using competitor = 0.8, using complementary = 0.6)
→ CRM match (using same CRM as best customers)
→ Infrastructure match (same cloud provider)
→ Integration readiness (has key integration platforms)
Category 3 — Behavioral (Weight: 25%):
→ Lead source match (same channel as best customers)
→ Content engagement pattern (similar content consumption before purchase)
→ Engagement velocity (similar response time to outreach)
→ Intent signal score (third-party intent data alignment)
→ Social media engagement (similar digital footprint)
Category 4 — Firm Dynamics (Weight: 20%):
→ Hiring trend alignment (growing companies like best customers)
→ Funding status match (similar stage in growth journey)
→ Market position similarity (leader/challenger status)
→ Digital maturity score (similar transformation stage)
→ Event participation (attending same industry events)
Scoring Algorithm:
→ Lookalike Score = Σ (Feature_Score × Feature_Weight) for all features
→ Normalize to 0–100 scale
→ Calibration: Train on historical customer data to optimize weights
→ Validation: Split test (80% training, 20% validation)
→ Target accuracy: AUC-ROC > 0.75 for customer prediction
Score Band Definitions:
Tier 1 — High Similarity (80–100):
→ 90%+ probability of being a good-fit customer
→ Action: Immediate AE outreach with relevant case study
→ Expected conversion rate: 15–25%
Tier 2 — Strong Similarity (60–79):
→ 70%+ probability of being a good-fit customer
→ Action: SDR outreach with targeted messaging
→ Expected conversion rate: 8–15%
Tier 3 — Moderate Similarity (40–59):
→ 50%+ probability of being a good-fit customer
→ Action: Nurture campaign with relevant content
→ Expected conversion rate: 3–8%
Tier 4 — Low Similarity (< 40):
→ < 50% probability of being a good-fit customer
→ Action: No active outreach; broad nurture only
→ Expected conversion rate: < 3%
Lookalike Campaign Execution
LOOKALIKE PROSPECTING CAMPAIGN
══════════════════════════════════════════════════════════════════════
Campaign Design:
Step 1 — Audience Segmentation:
→ Segment 1: Tier 1 lookalikes (80–100 score) — Account-Based approach
→ Segment 2: Tier 2 lookalikes (60–79 score) — SDR outbound approach
→ Segment 3: Tier 3 lookalikes (40–59 score) — Nurture approach
Step 2 — Message Personalization:
→ Match messaging to lookalike segment characteristics
→ Reference similar customer success stories
→ Customize value prop based on prospect industry/size
→ Include social proof from most similar customer
Step 3 — Campaign Execution:
Segment 1 — ABM Campaign (Tier 1, 80–100 score):
Timeline:
Week 1: Multi-threaded LinkedIn engagement (3+ team members per account)
Week 1: Personalized video message from AE
Week 2: Email with relevant case study (most similar customer)
Week 2: LinkedIn InMail with ROI data
Week 3: Phone call from AE (researched)
Week 3: Direct mail piece (personalized to company)
Week 4: Executive alignment meeting invitation
Week 5: Virtual or in-person demo
Week 6+: Continued multi-threaded engagement
Segment 2 — SDR Outbound Campaign (Tier 2, 60–79 score):
Timeline:
Day 1: Email with personalized opening + relevant case study
Day 4: LinkedIn connection request
Day 7: Follow-up email with ROI calculator
Day 10: Voicemail drop + calendar invite
Day 14: SDR callback
Day 21: Case study or webinar invitation
Day 30: Nurture re-engagement or disqualification
Segment 3 — Nurture Campaign (Tier 3, 40–59 score):
Timeline:
Week 1: Email with industry report or benchmark data
Week 2: Email with relevant blog post or guide
Week 3: Email with customer testimonial
Week 4: Webinar invitation or product tour
Week 6: ROI calculator or pricing guide offer
Week 8: Soft demo CTA or discovery call invitation
Week 12: Monthly nurture cadence
Step 4 — Conversion Tracking:
→ Track by lookalike score band (conversion rate, deal size, cycle length)
→ Compare to non-lookalike prospecting baseline
→ Calculate ROI per score band
→ Feed conversion data back into model for continuous improvement
Technology and Tools
LOOKALIKE AUDIENCE TOOLS
══════════════════════════════════════════════════════════════════════
Dedicated Lookalike Platforms:
→ 6sense:
Features: AI-powered lookalike modeling, account scoring, intent data
ML Model: Proprietary algorithm trained on B2B buyer behavior
Pricing: $15,000–$50,000/year
Best for: Mid-market to enterprise lookalike prospecting
→ Demandbase:
Features: Account-based lookalike targeting, ad targeting, ABM orchestration
ML Model: Account prediction model with intent scoring
Pricing: $30,000–$100,000/year
Best for: Enterprise ABM with lookalike audiences
→ Gong + 6sense (combined):
Features: Conversation intelligence + lookalike account scoring
ML Model: Intent + account prediction combined
Pricing: Combined $25,000–$75,000/year
Best for: Full-stack lookalike prospecting + sales intelligence
Data Enrichment Platforms (for feature engineering):
→ ZoomInfo: Firmographic, technographic, and contact data; $12,000–$50,000/year
→ Apollo: Contact database, intent data, email sequencing; $49–$249/month
→ Clearbit: Company and contact enrichment API; $250–$2,000/month
→ Datanyze: Technographic data (what software companies use); $10,000–$50,000/year
→ BuiltWith: Technology detection (web-based); $49–$2,000/month
DIY ML Implementation:
→ Data export from CRM (Salesforce, HubSpot)
→ Feature engineering in Python/R (pandas, scikit-learn)
→ Model training: Random Forest, XGBoost, or Logistic Regression
→ Scoring API: Deploy model to cloud (AWS SageMaker, Google AI Platform)
→ Integration: Webhook to CRM for automated scoring
→ Cost: $0–$5,000/month (data costs + cloud hosting)
→ Skill required: Data science team or external consultant
Model Training Data Requirements:
→ Minimum: 100 best customers + 500 total customers (for baseline)
→ Recommended: 500+ best customers + 5,000+ total customers
→ Data freshness: Updated within last 90 days
→ Feature completeness: 80%+ data coverage on key features
Edge Cases
- Small customer base: Companies with < 100 customers may not have enough data for reliable lookalike modeling
- Resolution: Use rule-based ICP scoring instead of ML; leverage industry benchmarks and analyst reports; use platform-built lookalike models (6sense, Demandbase) that train on cross-company data; start with broad segments and narrow over time
- Model drift: Customer profile characteristics change over time — model becomes stale
- Resolution: Retrain model quarterly (or after significant customer base changes); monitor conversion rates by score band (declining = model drift); incorporate new customer segments as they emerge; A/B test old vs. new model performance
- Overfitting to current customers: Model may create lookalikes that are too narrow — missing adjacent markets or new segments
- Resolution: Include diversity constraints in model; segment lookalikes by multiple customer archetypes (not just one "best" profile); periodically explore adjacent segments (expand by 1 industry, 1 size band); maintain manual override for strategic accounts
- Data quality issues: Incomplete or inaccurate data for best customers or prospects reduces model accuracy
- Resolution: Implement data quality checks before model training; use multiple data sources for cross-validation; flag low-confidence scores; manual review for high-value prospects with low data quality
- Privacy and compliance: Lookalike modeling involves processing personal and company data at scale
- Resolution: Ensure GDPR/CCPA compliance for data processing; use anonymized/aggregated data where possible; obtain proper consent for data enrichment; regular privacy audits; document data processing activities
- Cold start problem: New products or new market segments have no historical customer data
- Resolution: Use industry proxy data (similar companies in same vertical); leverage platform-built models trained on industry data; start with manual ICP definition; gather data from early customers and retrain after 100+ customers
- Score calibration issues: Raw ML scores may not align with actual sales conversion rates
- Resolution: Calibrate scores against historical conversion data; implement score-to-conversion lookup table; regularly update calibration based on new data; communicate calibrated scores to sales team (not raw model output)
Integration Points
- 6sense: AI-powered lookalike modeling with intent data; $15,000–$50,000/year
- Demandbase: Account-based lookalike targeting; $30,000–$100,000/year
- ZoomInfo: Firmographic and technographic enrichment; $12,000–$50,000/year
- Apollo: Contact database and email sequencing; $49–$249/month per seat
- Salesforce CRM: Lead scoring integration, account scoring; $25–$3,000/month per user
- HubSpot: Built-in predictive scoring and list segmentation; $80–$3,200/month
- Outreach.io/SalesLoft: Campaign execution for lookalike audiences; $80–$200/month per user
- AWS SageMaker: Custom ML model deployment; $0.10–$1.00/hour
- Google BigQuery: Data warehouse for customer analysis; $6.25/terabyte
- Tableau/Looker: Visualization of lookalike score distributions; $70–$1,200/month per user