Call Handling & Routing
Interactive Voice Response (IVR)
By Vadim Kouznetsov, Founder of BubblyPhone · Last updated April 5, 2026
Interactive Voice Response (IVR) is an automated phone system that interacts with callers through pre-recorded prompts and collects input via keypad tones (DTMF) or, in newer systems, speech recognition — routing the caller, collecting data, or completing transactions without a human agent. It is the technology behind every “press 1 for sales, press 2 for support” menu you have ever encountered.
A 30-year history
Commercial IVR systems appeared in the 1970s. Early deployments were built on proprietary hardware from companies like Periphonics and Intervoice, and they could do surprisingly little: play a recorded prompt, wait for DTMF tones, branch based on the digit received. The 1990s saw the rise of speech recognition-capable IVR, led by Nuance and Speechworks. The 2000s standardised the whole thing on VoiceXML, an XML dialect for describing call flows, which became the dominant authoring format for the next two decades.
For most of that history, IVR did what it was designed to do: reduce call center labor costs. A well-designed IVR could deflect 40 to 60% of calls that would otherwise need a human agent. That was an enormous operational win, even if callers never liked it.
Anatomy of a traditional IVR
Every IVR system, whether built on 1990s hardware or a modern cloud platform, has the same basic components:
- Prompts. Pre-recorded audio files (or TTS output) that play to the caller.
- Input collection. Listeners for DTMF tones or speech recognition results.
- Grammars.Definitions of what inputs are valid at each step (“one, two, three, or star”, or a list of accepted spoken phrases).
- Call flow logic. The state machine that decides what prompt to play next based on input received.
- Backend integration. Connections to databases, CRMs, or APIs so the IVR can look up account information or process transactions.
- Transfer paths. Routes to a human agent when the IVR cannot handle the request.
Why everyone hates it
Research has been consistent for two decades: callers prefer human agents to IVRs by overwhelming margins. The specific complaints show up in every study:
- Deep menu trees that take minutes to navigate
- Options that do not match the caller’s actual need
- The inability to skip ahead if you know what you want
- Speech recognition that fails on accents, noise, or phrasing outside the trained grammar
- Being sent back to the main menu after providing information
- Being transferred to an agent who asks you to repeat everything you just told the IVR
The last complaint is particularly telling. IVR systems almost never pass context to the receiving agent — the transfer is a cold transfer that throws away all the information the IVR collected.
What replaced it
The practical answer, as of 2026, is LLM-driven call flows running on speech-to-speech models like GPT Realtime and Gemini Live. These systems do the same job as an IVR — understand what the caller wants and handle or route the call — but without the menu tree, without the rigid grammar, and without the 2-second pauses between prompts.
The trade-offs are real and worth naming. Traditional IVRs are deterministic: you can predict exactly what happens for any given input. LLM-driven flows are probabilistic: they handle anything but they occasionally go off-script. For compliance-heavy workflows (payment capture, identity verification, regulated disclosures) the deterministic IVR is still the right answer. For open-ended customer interactions the LLM flow wins on almost every metric.
The hybrid reality
Most production systems in 2026 are not pure IVR or pure LLM. They are hybrids: an LLM-driven opening that figures out what the caller wants, a deterministic IVR-style module for any step that needs PCI-DSS or HIPAA guarantees, and an LLM again for anything open-ended. The combination is better than either alone: the flexibility of the LLM with the provability of the state machine.
Building the replacement in BubblyPhone Agents
BubblyPhone Agents is a platform for building the LLM-driven-replacement-for-IVR pattern. Phone number, system prompt, tools, done. If you need the deterministic parts, you build them in webhook mode where you control the pipeline. For the full conceptual background on the shift from menu trees to LLM flows, see the glossary entry on call flow.
Further reading
- W3C, Voice Extensible Markup Language (VoiceXML) 2.1 — the standard that defined IVR authoring for two decades.
- Nielsen Norman Group, Phone IVR Usability — summary of the long-standing research on why IVRs underperform with callers.