harsh-harlequin
harsh-harlequin2d ago

Which server message to listen to know when the assistant ends to speak

hey there — I'm working with Server Messages and wanted to know if there is a way to listen a particular event message that tells me when the last model output happens. I tried transcriptType="final" and model-output, and conversation-update but none of them seem to be reliable as they're being sent multiple times within the same AI turn. This is an example of model-output events:
{
"message": {
"timestamp": 1764796598671,
"type": "model-output",
"output": " It’s a photo of a family hanging a framed picture on a light-colored wall, working together as a team.<flush />",
...
}
}
{
"message": {
"timestamp": 1764796598671,
"type": "model-output",
"output": " It’s a photo of a family hanging a framed picture on a light-colored wall, working together as a team.<flush />",
...
}
}
{
"message": {
"timestamp": 1764796599003,
"type": "model-output",
"output": " What kinds of things do you usually notice when people are decorating their homes?<flush />",
...
}
}
{
"message": {
"timestamp": 1764796599003,
"type": "model-output",
"output": " What kinds of things do you usually notice when people are decorating their homes?<flush />",
...
}
}
As you can see, none of them indicates which one is the last one before the user starts speaking.
5 Replies
Vapi Support Bot
To reliably detect when the assistant has finished generating its response, listen for the following speech-update event, which signals the assistant is starting to speak:
{
"message": {
"type": "speech-update",
"status": "started",
"role": "assistant"
}
}
{
"message": {
"type": "speech-update",
"status": "started",
"role": "assistant"
}
}
When you receive this event, you can treat it as the end of the model's output for that turn. This approach is recommended in the documentation for tracking the assistant's speaking state and is more reliable than using model-output alone.
Source: Vapi Server Events Documentation
⤵️ Mention @Vapi Support Bot for follow-ups 🐥
harsh-harlequin
harsh-harlequinOP2d ago
but can the speech-update happen multiple times on the same turn?
deep-jade
deep-jade2d ago
You’re right model-output and conversation-update fire multiple times in a single turn, so they’re not reliable end-of-turn markers. The only dependable signal is the final speech-update, which fires once per turn, right before the user can speak. I can help you structure your VAPI event flow so your app cleanly detects that final event. Quick question: are you handling these events client-side or through a server webhook? @adrivelasco
harsh-harlequin
harsh-harlequinOP2d ago
@Tremix server webhook. speech-update seems to work as I was expecting although it does with a bit of delay as it runs when the assistant ends to speak and not when the LLM output last tokens. @Tremix the problem with speech-update is that it's sent too late for me (when TTS finishes) and I would like to run a process when I know the last LLM output happened.
deep-jade
deep-jade10h ago
I can help you handle this by detecting the real final LLM token instead of relying on TTS timing. There are reliable ways to mark end-of-turn using buffer tracking + custom server events. Your use case makes sense, and I can walk you through the setup privately. Happy to jump in and help you implement the cleanest approach. @adrivelasco

Did you find this page helpful?