Audio Synthesis Issue – Agents Speaking Gibberish Intermittently
Technical Details:
Call ID 1: 0199e7eb-91ab-7446-b301-02d59059e843
Call ID 2: 0199e3d2-49b6-7996-93d9-5bb43ad8016b
Questions:
- Agents affected: Multiple, but not all. Example configs and logs attached.
- Voice provider: All agents use
"provider": "vapi"and"voiceId": "Spencer"(see attached configs). - TTS/Transcriber: Deepgram for ASR (
model: nova-2), Vapi/Spencer for TTS. - Observed behavior:
- Call begins normally, correct greeting is transcribed.
- ElevenLabs WebSockets (as shown in telemetry) are opened for TTS.
- Audio output is sometimes correct, but sometimes garbled or nonsensical.
- No errors or warnings in our logs at the TTS/ASR setup or API request stage.
- No race conditions or shared state identified on our backend (see below).
- Compared agent configs and call telemetry logs for both good and bad calls.
- Confirmed that our backend is stateless and delegates all TTS/streaming to Vapi’s API.
- Ensured that every API call payload to Vapi is well-formed, consistent, and per-session/per-call.
- Verified that no local concurrency or race conditions can occur in our backend.
- The issue appears to be upstream, possibly in the Vapi platform or voice mapping.
Call ID 1: 0199e7eb-91ab-7446-b301-02d59059e843
Call ID 2: 0199e3d2-49b6-7996-93d9-5bb43ad8016b
Questions:
- Has anything changed recently in the Vapi infrastructure (voice models, session management, streaming, etc.) that could explain this?
- Are there any known issues with the
"Spencer"voice or the"vapi"provider? - Is there additional debugging or logging we can enable on your end to help pinpoint the issue?