Why most lead scoring models are wrong
Every scoring model we audit that was built from scratch is wrong in the same ways:
- Weights are guessed, not calibrated. Someone decided a demo request should be worth 50 points and an eBook download should be worth 5. They're not wrong directionally, but the 5x:50x ratio probably isn't right — and you can't know without data.
- No decay on behavioral signals. A whitepaper download 18 months ago is worth the same as one yesterday. It isn't.
- No negative scoring. Competitor domains, unsubscribes, and explicit non-buying signals don't reduce score, they just fail to add to it.
- MQL threshold set arbitrarily. “100 points equals MQL” — but at 100 points, what's the actual SQL conversion rate? If you don't know, the threshold is wrong.
- No periodic recalibration. The model was built once and hasn't been touched in 3 years. Buying behavior changed; the model didn't.
The 5-step framework
Step 1: Get historical data
Pull 12–24 months of lead-to-SQL conversion data with all behavioral, demographic, and firmographic dimensions you'll consider scoring on. This data lives across your MAP, CRM, and (often) a data warehouse — pull it once, properly.
Minimum data needed:
- Lead created date
- Lead → SQL conversion date (if applicable)
- Behavioral history (page visits, downloads, email engagement, form fills)
- Demographic data at conversion time (title, function, seniority)
- Firmographic data (company size, industry, tech stack)
- Source / channel
Step 2: Identify predictive signals
Run conversion-rate analysis across each signal candidate. The question to answer for each: “What's the SQL conversion rate of leads with this signal vs. without it?”
Keep signals where the high-value group's conversion rate is at least 2x the average. Drop signals where the lift is below 1.5x — they're not worth the modeling complexity.
Step 3: Set weights based on actual lift
Weight each signal proportionally to its conversion lift, not based on intuition. If “Director+ title” has 4x the conversion of average, and “visited pricing page” has 8x the conversion, the pricing-page signal should be weighted ~2x the title signal.
Sample weighting structure for a typical B2B SaaS:
| Signal | Type | Sample weight | Decay? |
|---|---|---|---|
| Demo request | Behavioral | +50 | 30 days |
| Pricing page visit | Behavioral | +25 | 60 days |
| Whitepaper download | Behavioral | +10 | 90 days |
| Email click | Behavioral | +5 | 90 days |
| Director+ title | Demographic | +20 | None |
| Marketing/RevOps function | Demographic | +15 | None |
| 500+ employees | Firmographic | +15 | None |
| Has Marketo/HubSpot in stack | Firmographic | +20 | None |
| Competitor domain | Negative | -50 | None |
| Student/researcher title | Negative | -50 | None |
| Unsubscribed | Negative | -100 | None |
These are illustrative — your specific weights depend on your historical data.
Step 4: Calibrate the MQL threshold
Run scored leads through the model retroactively and plot score-vs-SQL-conversion-rate. Set the MQL threshold at the score where:
- SQL conversion rate plateaus or peaks (the top 10–25% of leads)
- The volume above that threshold is what your sales team can handle
- SQL conversion rate at the threshold is acceptable to Sales
The score number itself is meaningless. What matters: at this score, X% of leads convert to SQL. That's what Sales is buying when they accept your MQLs.
Step 5: Add decay and review quarterly
Behavioral signals decay over 30–90 days. Demographic and firmographic scores generally don't decay — but should be re-checked annually for changes (job changes, company size changes, tech stack changes).
Quarterly review: pull the last 90 days of scored leads, check whether MQL→SQL conversion rate at threshold is still acceptable. Annual recalibration: full re-run of the model against rolling 12-month data.
Need help calibrating your scoring?
Most B2B teams' scoring models are 60% built and 40% guessed. The 30-min scoping call covers what a calibration project would look like for your instance.
Common scoring mistakes (and how to avoid them)
Mistake 1: Scoring inflation
60% of contacts above MQL threshold means scoring is broken — usually because the threshold is too low or behavioral decay is missing. Sales starts ignoring MQLs because too many are unqualified, and the whole program loses credibility. Fix: recalibrate threshold, add decay.
Mistake 2: Scoring under-coverage
Only 5% of contacts above MQL threshold means scoring is too conservative. You're losing pipeline to leads that converted without ever being marketed to as MQLs. Fix: lower threshold, broaden positive signal set.
Mistake 3: No model documentation
Scoring rules live in Marketo or HubSpot but no one knows why they exist. When the original builder leaves, the model becomes untouchable — too risky to change without understanding the original logic. Fix: write down the model, the rationale, the calibration data, and the review schedule.
Mistake 4: Treating account scoring as contact scoring
Summing up contact scores at the account level doesn't produce a meaningful account score — companies with more contacts always score higher regardless of actual intent. Use weighted average, max-of-key-roles, or proper account-level scoring tools (6sense, Demandbase) for ABM scoring.
Industry benchmarks
Per Salesforce's State of Marketing report, high-performing B2B teams report:
- MQL → SQL conversion rate: 25–40% (vs. 10–15% for typical mid-market)
- SQL → SAO (Sales Accepted Opportunity): 50–70%
- SAO → Won: 15–25% (varies wildly by deal size and industry)
If your MQL→SQL rate is below 15%, scoring is the most likely culprit — followed by lead routing and SLA adherence. A MOPs audit separates the causes.
Frequently Asked Questions
How many points should an MQL threshold be?
No universal answer. Set the threshold where the top 15–25% of leads sit, then adjust based on sales capacity and SQL conversion rate. The number is meaningless; the SQL conversion rate at that threshold is what matters.
Should we use behavioral, demographic, or firmographic scoring?
All three. Typical mid-market B2B model: ~40% behavioral (intent), ~30% demographic (contact fit), ~30% firmographic (account fit). Proportions vary by company.
How fast should scores decay?
Behavioral: 25% decay after 30 days, 50% after 60 days, 75% after 90 days. Demographic/firmographic: no decay, but re-check annually for changes.
Should we score in Marketo/HubSpot or in a separate tool?
Native scoring works for most B2B teams. Add 6sense / Demandbase / MadKudu when you need predictive AI, third-party intent data, or sophisticated multi-touch attribution scoring. Start native; layer in only when needed.
How often should we recalibrate the scoring model?
Review quarterly, full recalibration annually. Sooner if business model changes, sales process changes, or MQL→SQL conversion drops 20%+ from baseline.
What about negative scoring?
Essential. Common rules: -50 competitor domains,-25 non-business email domains, -50 student/researcher titles, -100 unsubscribe events.
Should we score by individual contact or by account?
Both. Contact-level for individual intent. Account-level for ABM and buying-readiness. Most B2B ABM teams need both. Native tooling (Marketo account-based smart lists, HubSpot company scoring) supports this.
What's the SLA between Marketing and Sales for MQL follow-up?
High-priority leads (demo requests): 5-minute SLA. Standard MQLs: 24-hour SLA. Per InsideSales research, conversion drops 80% past 5 minutes. SLA enforcement (Slack alerts, escalation) is high leverage.
Want help building or rebuilding your scoring model?
The 30-min scoping call covers what a calibration would look like for your team — timeline, cost, and expected lift in MQL→SQL conversion.
Related: MOPs Audit service, Marketo vs HubSpot vs Pardot, What is a MOPs audit?.