AI Phone Agent Operations

Concurrent Calls

By Vadim Kouznetsov, Founder of BubblyPhone · Last updated April 5, 2026

Concurrent calls is the number of phone calls a system is handling simultaneously at a given moment in time — the capacity ceiling that determines how many callers can be connected at once before new calls are rejected, queued, or sent to overflow. It is the single most important capacity metric for any voice infrastructure, and the one that most often surprises teams moving from a prototype to production.

Why concurrent capacity is different from throughput

A system that handles 1,000 calls per day is not the same as a system that handles 1,000 concurrent calls. The difference is how long each call lasts. 1,000 calls spread evenly over 24 hours is one call every 86 seconds — a single line could handle it without ever having two calls at once. 1,000 calls all arriving in the same minute needs 1,000 simultaneous lines.

The relationship between throughput and concurrency is Little’s Law: average concurrency equals arrival rate times average duration. A system receiving 10 calls per minute where each call averages 3 minutes will have, on average, 30 concurrent calls. Peaks will be higher than the average — usually 2 to 3x for voice traffic — so the capacity you actually need to provision is larger than the average concurrency.

Where the limits actually live

A concurrent call limit can come from multiple points in the stack, and the one that bites first is rarely the one you expected.

  • Carrier SIP trunk capacity. The carrier selling you origination or termination caps how many simultaneous calls that trunk can carry. For smaller plans this is often 10 or 100; for enterprise it is typically thousands. If your trunk is full, the carrier rejects new calls with a 503 response.
  • Per-number limits.Even within a trunk with high capacity, individual phone numbers usually have per-number caps on concurrent calls and calls per hour. These are anti-fraud measures — a single number placing 200 simultaneous calls looks like a robocall operation to carrier abuse systems.
  • Platform limits. The telephony API platform sitting between your code and the carrier has its own account-level concurrency caps. These are usually higher than the carrier cap but exist to protect the platform itself.
  • AI model rate limits. This is the new one that catches teams off guard in 2026. Every concurrent call in streaming mode holds open a WebSocket session with an LLM provider. Those providers have their own concurrency limits (often 100 or 200 per API key by default). If you hit the LLM provider cap, your calls connect but the caller gets silence. This is often the bottleneck before any telephony cap.
  • Application bottlenecks. The code that handles tool calls, database lookups, and webhooks has its own concurrency limits. A tool handler that holds a database connection per call will run out of connections before any other limit is reached.

Planning capacity for AI phone agents

The practical approach, in order:

  • Measure the peak, not the average.Take your expected call volume, compute the average concurrency via Little’s Law, and multiply by 2.5 for safety. That is the concurrency ceiling you need.
  • Find the lowest cap in the stack. Walk down the list above and identify the smallest limit. That is your actual ceiling, regardless of how high the others are.
  • Add capacity where it is cheapest. Buying more phone numbers is usually the cheapest way to raise per-number caps. Requesting a higher rate limit from the LLM provider is usually the cheapest way to raise the AI cap. Provisioning more database connections is usually the cheapest way to raise the app cap.
  • Test before you need it. Run load tests against your system at 1.5x your expected peak. The failures you see there are the ones you will see in production unless fixed first.
  • Plan for graceful degradation. When the cap is reached, the correct behaviour depends on the use case. For inbound, queue the call with a short hold message. For outbound campaigns, back off and retry later. Silently failing is the worst outcome.

Concurrent call limits in BubblyPhone Agents

BubblyPhone Agents applies per-number outbound limits (up to 30 per hour per phone number) and account-level concurrency limits that scale with your plan. Inbound concurrency is governed primarily by the phone number itself and the upstream carrier trunk. For detailed limits and how to request higher quotas, see the API documentation.

Further reading