
Table of Contents
Most AI outbound calling dashboards are cluttered with the wrong numbers. The team tracks call volume and average handle time and total minutes consumed, and they miss the handful of metrics that actually tell them whether the campaign is working. A campaign that looks great on call volume can be hemorrhaging money on low connection rates or bad targeting, and the dashboard will never surface the problem until the quarterly review.
This article is a focused list of the eight KPIs that matter for an AI outbound calling campaign, why each one matters, how to calculate it, and what a healthy number looks like. If you track these eight and act on them, you have most of what you need.
1. Connection Rate
What it measures: The percentage of attempted calls that connect to a real human being.
Formula: (Calls answered by a person / Total calls attempted) × 100.
Why it matters: Connection rate is the first domino. If only 15% of your calls reach a human, every downstream metric is starved of the sample size it needs to be meaningful. Low connection rate is usually a combination of two things: bad number quality (disconnected lines, wrong numbers, landlines that nobody picks up) and bad caller ID reputation (your number is being flagged as spam by the recipient's carrier).
What good looks like: 25–40% for cold outbound to a rented list. 50–70% for warm leads or existing customers. Below 15% indicates a real problem with either the list or the caller ID — investigate before spending more on the campaign.
What to do if it is low: Audit caller ID reputation first. Check whether your numbers are registered for branded caller ID and STIR/SHAKEN attestation. Audit the list for stale numbers. Consider rotating numbers if any single number is showing “spam likely” labels on recipient carriers.
2. Qualification Rate
What it measures: Of the calls that connect, what percentage meet your defined qualification criteria.
Formula: (Qualified leads / Connected calls) × 100.
Why it matters: This is the first quality signal in the funnel. A high connection rate with a low qualification rate means you are reaching people, but they are not the right people. Either your list is wrong, your qualification criteria are too narrow, or your script is scaring off legitimate prospects before they reveal enough information to qualify.
What good looks like: Varies enormously by industry and list quality, but 20–40% is a healthy range for most B2B outbound. Below 10% usually indicates a list problem or a script problem. Above 60% suggests your qualification criteria may be too loose to be meaningful.
What to do if it is low: Listen to ten connected calls that failed to qualify. The answer is almost always obvious: wrong audience, wrong opening, wrong qualifying questions, or criteria that are too strict.
3. Appointment/Meeting Booking Rate
What it measures: Of qualified leads, what percentage agree to a next step (demo, consultation, appointment).
Formula: (Appointments booked / Qualified leads) × 100.
Why it matters: This is where the AI agent's conversational quality shows up. A qualified lead that does not book a meeting is a failure of the close. Either the AI is not asking for the meeting clearly, it is not handling the common objections well, or the value proposition is not landing with the specific lead segment.
What good looks like: 40–60% for well-qualified B2B leads. Lower for cold outbound, higher for leads already in the buying process.
What to do if it is low: Review the transcripts of 20 qualified calls that did not result in bookings. Look for patterns in how the AI is asking for the meeting, how it is handling the common objections (“I'm busy,” “send me an email,” “I'll think about it”), and whether the proposed times are working for the prospect. The fix is usually a prompt adjustment, not a strategic rethink.
4. Show Rate
What it measures: Of booked appointments, what percentage actually happen.
Formula: (Meetings held / Meetings booked) × 100.
Why it matters: Booked meetings are a vanity metric if half of them do not happen. Show rate is where the AI campaign's real impact on the sales pipeline becomes visible. Low show rate usually means the AI is booking meetings too aggressively — pressuring prospects into commitments they do not intend to keep — or the meetings are being booked too far out (prospects forget or lose interest).
What good looks like: 60–80% for meetings booked within one week. Lower for meetings booked further out.
What to do if it is low: Two fixes. First, add reminder calls (or SMS) 24 hours before each meeting — this alone typically lifts show rate by 15–25 percentage points. Second, tighten the AI's booking criteria: require prospects to confirm interest explicitly rather than letting them agree to a meeting just to end the call.
5. Conversion to Close
What it measures: Of meetings held, what percentage convert to a signed deal, purchase, or whatever the campaign's actual outcome is.
Formula: (Closed deals / Meetings held) × 100.
Why it matters: This is the one that actually pays for the campaign. Every other metric is a leading indicator of this number. If your AI campaign has high connection, good qualification, strong booking rate, solid show rate, and then the closes never materialize, something is wrong upstream — probably the qualification is not matching the real buying criteria.
What good looks like: Varies wildly by industry and sales motion. What matters is not the absolute number but the trend: is the conversion rate of AI-sourced meetings matching or beating the conversion rate of meetings sourced through other channels? If yes, the campaign is working. If no, go upstream to qualification.
What to do if it is low: Compare the AI-sourced meetings that did close to the ones that did not. The pattern is usually obvious after a dozen examples. Adjust qualification criteria to match the profile of closers.
6. Cost Per Qualified Lead (CPQL)
What it measures: Total campaign cost divided by qualified leads generated.
Formula: (Total campaign cost including platform fees, list cost, and per-minute charges) / Qualified leads.
Why it matters: This is the ROI dial you turn. Lower CPQL means the campaign is more efficient. It is also the number that lets you compare AI outbound against every other acquisition channel (paid ads, content marketing, trade shows). If your AI CPQL is meaningfully better than your other channels, expand the AI program. If it is worse, figure out why before doubling down.
What good looks like: Depends entirely on the deal size. For a B2B deal with $10K+ ACV, a CPQL under $50 is excellent. For a consumer product, it should be much lower.
What to do if it is high: Work backwards. If CPQL is high because connection rate is low, fix caller ID reputation. If it is high because qualification rate is low, fix the list or the script. If it is high because the per-minute model cost is eating the budget, consider moving to a cheaper AI model or a BYOK arrangement.
7. Average Call Duration
What it measures: How long the average outbound call lasts.
Formula: Total call seconds / Total calls.
Why it matters: This is a diagnostic metric, not a goal metric. Very short calls (under 30 seconds) suggest callers are hanging up early — either because the AI is opening poorly or because the callers are recognizing it as automated and rejecting it. Very long calls (over 5 minutes) suggest the AI is taking too long to qualify or close, which is eating per-minute budget without improving outcomes.
What good looks like: 1.5 to 3 minutes for a typical qualification call. Under 1 minute is a hangup problem. Over 4 minutes is an efficiency problem.
What to do if it is off: If too short, review the opening and the first 30 seconds of the call. The AI needs to establish credibility fast. If too long, review the middle of the call. The AI is probably asking too many questions or failing to move toward a close. Both are prompt adjustments.
8. Sentiment and Transcript Flags
What it measures: The quality signals buried in the call transcripts — caller sentiment, rude or hostile interactions, compliance issues, and any calls where the AI gave information it should not have.
Formula: Harder to quantify. Use post-call analysis to classify every transcript across sentiment (positive/neutral/negative), compliance (did the AI stay inside its boundaries?), and anomalies (did the AI commit to something it should not have?).
Why it matters: The other seven metrics tell you whether the campaign is performing. This one tells you whether it is safe. An AI outbound campaign with strong conversion numbers but a growing sentiment problem or a few compliance slip-ups is building toward a lawsuit, a regulator complaint, or a reputational incident. This is the metric that catches those before they become visible from the outside.
What good looks like: Sentiment should skew neutral-to-positive across the sample. Compliance flags should be near zero — any non-zero rate deserves immediate investigation. Hostile interaction rate (callers actively annoyed at being on the phone) should be under 5%; higher means the list or the approach is wrong.
What to do if it is off: Sentiment problems usually trace back to list quality (calling people who should not be on the list) or opening problems (the AI is being too pushy in the first 10 seconds). Compliance flags should trigger an immediate script audit. Hostile interaction problems often mean the call cadence is too aggressive — you are calling the same numbers too often.
How to actually use these KPIs
A few rules about the dashboard itself.
Track them as a funnel, not a list. Connection rate feeds qualification rate feeds booking rate feeds show rate feeds close rate. The leaks happen at the transitions, not in the totals. A dashboard that shows each metric in isolation hides the funnel shape.
Compare week over week, not against an absolute target. Targets in outbound calling are almost always wrong at first — the realistic numbers depend on your specific list, product, and market. The right first move is to measure the baseline, then improve from it. “Better than last week” is a more useful goal than “hit 30% connection rate” at the start.
Review the transcripts, not just the dashboard. Numbers tell you something is wrong; transcripts tell you why. Every week, have someone (or an LLM pass) review 20–30 failed calls to find the pattern. This is the feedback loop that actually improves the campaign over time.
Do not optimize all eight at once. Pick the one that is most broken and fix it. Then re-measure everything, because improving one often cascades to others. Trying to tune all eight in parallel produces changes that are hard to attribute to any single fix.
The one KPI you should not track
Total calls made. It is the most common metric on AI outbound dashboards and it is the least useful. Volume without outcome is a cost, not a performance signal. Teams that track total calls made as their primary KPI tend to optimize for more calls, which produces worse quality, which produces worse outcomes, which produces a worse campaign. Track outcomes, not effort.
Further reading
- AI Outbound Calls: Build Automated Calling Campaigns — the end-to-end campaign patterns these KPIs measure.
- Post-Call Analysis — BubblyPhone Agents Glossary — how to extract the transcript-level metrics (#8) automatically.
- Call Analytics — BubblyPhone Agents Glossary — the three-layer framework that separates operational metrics from outcome metrics.
- AI Cold Calling: How AI Phone Agents Are Replacing Manual Dialers — the operational playbook.
Ready to run a campaign with real measurement? Sign up for BubblyPhone Agents — call logs, transcripts, and structured call data are available via the API for building any of these KPIs into your dashboard.
Ready to build your AI phone agent?
Connect your own AI to real phone calls. Get started in minutes.
Related Articles
6 minKore.ai Alternative: When to Pick Something Lighter
Kore.ai serves large enterprises with heavy compliance needs. For most teams, a lighter alternative is a better fit. Here is how to decide.
8 minDTMF vs Voice Recognition: When to Use Each for AI Phone Agents
When to use DTMF keypad input versus voice recognition in AI phone agents. A practical guide to the trade-offs, the hybrid pattern, and the cases where each wins.
7 minLocal vs Toll-Free Numbers for AI Phone Agents: Which to Use When
The real difference between local and toll-free phone numbers for AI phone agents. Answer rate data, cost comparison, and when each type actually makes sense.
5 minAir.ai Alternative: Developer-First Voice AI Without the Pricing Mystery
Looking for an Air.ai alternative with transparent pricing and a developer-first workflow? A practical comparison of what each platform offers and who fits where.