Conversation Intelligence
Dialogue Management
By Vadim Kouznetsov, Founder of BubblyPhone · Last updated April 5, 2026
Dialogue management is the component of a conversational system that decides what to do next at each turn — what to say, what to ask, when to take an action, and when to end the conversation — based on the state of the conversation so far and the goal the system is trying to accomplish. It is the brain that sits between understanding the caller and producing a response.
The classical pipeline
In the classical architecture of a spoken dialogue system, from the 2000s through the early 2020s, dialogue management was a distinct module in a four-stage pipeline:
- Speech recognition (ASR) turned audio into text.
- Natural language understanding (NLU)turned text into structured intents and entities — see intent detection.
- Dialogue manager took the NLU output and the current dialogue state, decided on an action, and produced a semantic response.
- Natural language generation (NLG) and TTS turned the semantic response into spoken audio.
In this architecture the dialogue manager was doing real work. It had to maintain a dialogue state (what has been said, what slots have been filled, what the caller wants), select the next action from a policy, and handle things like clarification, confirmation, and error recovery. Implementations ranged from handwritten finite state machines to statistical models trained on dialogue data to reinforcement learning agents.
What LLMs did to the module
The classical pipeline still exists in places, but it no longer describes how most new voice systems are built. In an LLM-driven AI phone agent, dialogue management is not a separate module. It is a property of the language model itself.
The LLM reads the entire conversation history (or a sliding window of it), reads the system prompt that defines the agent’s goals, and produces the next response. Dialogue state is implicit in the conversation history. The policy is implicit in the prompt. There is no separate state machine to design, no separate policy to train. Whether this is an improvement or a loss of control is a real debate.
What still needs explicit management even with an LLM
Treating dialogue management as “the LLM handles it” gets you surprisingly far, but it is not the whole story. A few things still need explicit handling:
- Turn-taking and interruption. When does the agent start speaking? When does it stop because the caller is speaking? This is handled by voice activity detection and conversation-level logic outside the model, not by the model itself.
- Tool invocation timing. The LLM decideswhether to call a tool, but the runtime around it decides when to call it, how to handle failures, and what happens while the tool is running. Filler phrases, retry logic, and timeout handling all live outside the model.
- Guardrails.Some decisions should not be left to the model. Hanging up on abusive callers, transferring to a human after a certain number of failed turns, refusing to discuss specific topics — these are usually enforced by code that wraps the LLM, not by the LLM itself.
- Conversation length control. An LLM left to its own devices will happily have a 15-minute conversation. If your business logic requires calls to wrap up in 3 minutes, you need explicit dialogue management to enforce that.
Hybrid dialogue management
The most robust production systems in 2026 are hybrid: an LLM handles the flexible conversational parts, and a thin layer of explicit dialogue management enforces the things the LLM cannot be trusted with alone. The explicit layer is usually small — a few rules, a handful of tool-invocation helpers, maybe a timer — but it does the heavy lifting of turning a chatty model into a reliable agent.
Dialogue management in BubblyPhone Agents
In streaming mode, BubblyPhone Agents relies on the LLM’s own dialogue management plus a small amount of platform-level control for turn-taking and transfers. In webhook mode, you build your own dialogue manager on top of the transcription events and respond with actions. Most teams on webhook mode end up writing a thin explicit layer around whatever LLM they prefer, rather than a full classical dialogue manager — the LLM handles enough of the problem that the explicit code becomes small. See the call flow entry for the related concept of how the whole conversation is structured.
Further reading
- Jurafsky & Martin, Chatbots & Dialogue Systems — the standard textbook chapter covering classical dialogue management and the shift to LLM-based systems.