Audio Synthesis Issue – Agents Speaking Gibberish Intermittently

Technical Details:
  • Agents affected: Multiple, but not all. Example configs and logs attached.
  • Voice provider: All agents use "provider": "vapi" and "voiceId": "Spencer" (see attached configs).
  • TTS/Transcriber: Deepgram for ASR (model: nova-2), Vapi/Spencer for TTS.
  • Observed behavior:
    • Call begins normally, correct greeting is transcribed.
    • ElevenLabs WebSockets (as shown in telemetry) are opened for TTS.
    • Audio output is sometimes correct, but sometimes garbled or nonsensical.
    • No errors or warnings in our logs at the TTS/ASR setup or API request stage.
    • No race conditions or shared state identified on our backend (see below).
Our Investigation:
  • Compared agent configs and call telemetry logs for both good and bad calls.
  • Confirmed that our backend is stateless and delegates all TTS/streaming to Vapi’s API.
  • Ensured that every API call payload to Vapi is well-formed, consistent, and per-session/per-call.
  • Verified that no local concurrency or race conditions can occur in our backend.
  • The issue appears to be upstream, possibly in the Vapi platform or voice mapping.
Attached for your review:
Call ID 1: 0199e7eb-91ab-7446-b301-02d59059e843
Call ID 2: 0199e3d2-49b6-7996-93d9-5bb43ad8016b

Questions:
  • Has anything changed recently in the Vapi infrastructure (voice models, session management, streaming, etc.) that could explain this?
  • Are there any known issues with the "Spencer" voice or the "vapi" provider?
  • Is there additional debugging or logging we can enable on your end to help pinpoint the issue?
Was this page helpful?