VAPI•4mo ago

Voice Agent Cutoff Issue - Call Suddenly Stops Mid-Conversation

Hello VAPI Support Team,
I'm experiencing intermittent voice cutoff issues with my voice agent where the assistant suddenly stops responding mid-conversation, leading to timeouts and call termination. I'd appreciate your guidance on debugging and resolving this issue.
Issue Details:
Call ID: 118caaf5-57d8-4790-ab8f-4f9cacf0a6ea
Timestamp: Approximately 12:26:05 AM (please note timezone if relevant)
Behavior: The voice agent cuts off unexpectedly during conversation, provides no response, and eventually times out causing the call to end
Frequency: This happens occasionally, not consistently
Current Setup:

Using dynamic system prompt generation with approximately 2,700 tokens in the system prompt
Assistant overrides are applied per-call with custom system prompts
Integration includes conversation memory, user context, and behavioral adaptations
Server-side webhook processing for call events and custom tools

Potential Contributing Factors I'm Investigating:
System Prompt Size: My system prompt is quite large (~2,700 tokens) and for model i am using Groq Gpt Oss 120b , Could this be hitting token limits or causing processing delays? Are there recommended limits for system prompt size? Could the large context be causing the underlying model to timeout or fail i do performance metrics analysis and the call artifact shows avg_model_latency= 905ms the shown groq latency was 280ms in config, is it related to communication to Groq servers?

What metrics or logs should I monitor on my end to better diagnose these cutoffs? Are there specific webhook events that indicate processing failures?
Should I implement any fallback logic when the assistant becomes unresponsive?

then-purple•9/12/25, 3:53 PM

I’m not part of the VAPI team, but I specialize in debugging and optimizing setups like this. From what you’ve shared, the large system prompt (~2,700 tokens) combined with the Groq GPT OSS 120b model could definitely be contributing to the cutoffs — especially since higher token contexts increase latency and risk of timeouts. Monitoring model latency vs. network latency and correlating with webhook error or timeout events would help confirm whether it’s a model processing delay or a communication gap.

A good next step might be to test performance with a trimmed-down system prompt (e.g., under 1,500 tokens) and compare model latency artifacts. Do you currently have visibility into the webhook failure events or error codes when the assistant cuts off?

If you’d like, I can assist you through a structured debugging process and help you implement fallback logic to reduce dropped conversations. Let’s connect and talk through how I can assist you further.

rubber-blue•9/12/25, 6:33 PM

I'm having the same sporadic issue :/

It's a little hard to debug, since Vapi doesn't even trigger the idle messages configuerd in the assistant. This makes me think that something is wrong on Vapi side -- Or not dealing with bad llm responses.

Rrubber-blue I'm having the same sporadic issue :/ It's a little hard to debug, since Vapi ...

then-purple•9/12/25, 7:12 PM

I can help troubleshoot this and see if the issue is on the VAPI side or with how the assistant handles certain responses. Could you share the logs around when the idle message should trigger? That’ll make it easier to pinpoint the problem. @Nomadiction

Rrubber-blue I'm having the same sporadic issue :/ It's a little hard to debug, since Vapi ...

broad-emerald•9/12/25, 11:17 PM

I agree.
I get cut offs and Vapi just doesn't seem to hear the caller.
80% of calls are great 20% are the worst.
vapi never seems to get a email right now

At first I thought is might be call quality from Twilio but event test from a web call it is the same.
It used to be great but the last 3+ weeks has been really bad and getting worse

managerial-maroon•9/12/25, 11:41 PM

@grumpyboy what model are you using?

I have a guess.
When a completion response generates a chunk with finish_reason different than the ones supported by OpenAI SDK, Vapi is having some issue (maybe an unexpected error)

cc: @Vapi

Tthen-purple I’m not part of the VAPI team, but I specialize in debugging and optimizing setu...

rubber-blueOP•9/15/25, 11:43 AM

thanks i did add performance metrics and record them at the end of the call, I will do some measurements with trimmed prompt, inform you about the results. But for the webhook error case i do not see them directly, cause its kinda diffucult to debug it from the vapi logs, Let me analyze the results with new prompt if the issue continues I may connect you for further assistant. Kind regards

VapiAPP•9/16/25, 10:21 AM

We aren't seeing any errors in the logs from the call id you provided, but we are reviewing the logs some more to find the cause of the silence timeout. Also please let us know if you have any updates about your new prompt results

managerial-maroon•9/17/25, 10:19 PM

quick update here: When using Vertex AI with the Gemini 2.5 Flash model, refusal messages from the LLM are being propagated directly into Vapi responses, resulting in malformed or irrelevant content being returned to callers.
Example:
Observed Behavior
Gemini returns a refusal message in the delta:

{
  "delta": {
    "refusal": "Malformed function call: print(.....",
    "role": "assistant"
  },
  "finish_reason": "malformed_function_call",
  "index": 0,
  "logprobs": null
}

{
  "delta": {
    "refusal": "Malformed function call: print(.....",
    "role": "assistant"
  },
  "finish_reason": "malformed_function_call",
  "index": 0,
  "logprobs": null
}

Vapi then surfaces this directly in the response logs:

"response": {
  "content": "Malformed function call: print(....."
}

"response": {
  "content": "Malformed function call: print(....."
}

Impact: End users may hear irrelevant or confusing error strings like "Malformed function call: print(...)" during live calls, reducing call quality and creating a poor user experience.

If you guys need some callId examples, text me in private.

VapiAPP•9/19/25, 10:03 PM

In your assistant prompt, instruct the assistant to extract the string value of the result you want it to say and avoid saying anything else, whether that is the value after print(...) or something else. How different are the responses in terms of refusals and accepted ones?

Voice Agent Cutoff Issue - Call Suddenly Stops Mid-Conversation

Similar Threads

Similar Threads

Similar Threads