VAPI•3mo ago

Audio Synthesis Issue – Agents Speaking Gibberish Intermittently

distinguished-brown · 2025-10-16T19:17:59.569Z

**Technical Details:** - **Agents affected:** Multiple, but not all. Example configs and logs attached. - **Voice provider:** All agents use `"provider": "vapi"` and `"voiceId": "Spencer"` (see attached configs). - **TTS/Transcriber:** Deepgram for ASR (`model: nova-2`), Vapi/Spencer for TTS. - **Observed behavior:** - Call begins normally, correct greeting is transcribed. - ElevenLabs WebSockets (as shown in telemetry) are opened for TTS. - Audio output is sometimes correct, but sometimes garbled or nonsensical. - No errors or warnings in our logs at the TTS/ASR setup or API request stage. - No race conditions or shared state identified on our backend (see below). **Our Investigation:** - Compared agent configs and call telemetry logs for both good and bad calls. - Confirmed that our backend is stateless and delegates all TTS/streaming to Vapi’s API. - Ensured that every API call payload to Vapi is well-formed, consistent, and per-session/per-call. - Verified that no local concurrency or race conditions can occur in our backend. - The issue appears to be upstream, possibly in the Vapi platform or voice mapping. **Attached for your review:** Call ID 1: 0199e7eb-91ab-7446-b301-02d59059e843 Call ID 2: 0199e3d2-49b6-7996-93d9-5bb43ad8016b **Questions:** - Has anything changed recently in the Vapi infrastructure (voice models, session management, streaming, etc.) that could explain this? - Are there any known issues with the `"Spencer"` voice or the `"vapi"` provider? - Is there additional debugging or logging we can enable on your end to help pinpoint the issue?

Technical Details:

Agents affected: Multiple, but not all. Example configs and logs attached.
Voice provider: All agents use
```
"provider": "vapi"
```
"provider": "vapi"
```
"provider": "vapi"
```
"provider": "vapi"
and
```
"voiceId": "Spencer"
```
"voiceId": "Spencer"
```
"voiceId": "Spencer"
```
"voiceId": "Spencer"
(see attached configs).
TTS/Transcriber: Deepgram for ASR (
```
model: nova-2
```
model: nova-2
```
model: nova-2
```
model: nova-2
), Vapi/Spencer for TTS.
Observed behavior:
- Call begins normally, correct greeting is transcribed.
- ElevenLabs WebSockets (as shown in telemetry) are opened for TTS.
- Audio output is sometimes correct, but sometimes garbled or nonsensical.
- No errors or warnings in our logs at the TTS/ASR setup or API request stage.
- No race conditions or shared state identified on our backend (see below).

Our Investigation:

Compared agent configs and call telemetry logs for both good and bad calls.
Confirmed that our backend is stateless and delegates all TTS/streaming to Vapi’s API.
Ensured that every API call payload to Vapi is well-formed, consistent, and per-session/per-call.
Verified that no local concurrency or race conditions can occur in our backend.
The issue appears to be upstream, possibly in the Vapi platform or voice mapping.

Attached for your review:
Call ID 1: 0199e7eb-91ab-7446-b301-02d59059e843
Call ID 2: 0199e3d2-49b6-7996-93d9-5bb43ad8016b

Questions:

Has anything changed recently in the Vapi infrastructure (voice models, session management, streaming, etc.) that could explain this?
Are there any known issues with the
```
"Spencer"
```
"Spencer"
```
"Spencer"
```
"Spencer"
voice or the
```
"vapi"
```
"vapi"
```
"vapi"
```
"vapi"
provider?
Is there additional debugging or logging we can enable on your end to help pinpoint the issue?

Audio Synthesis Issue – Agents Speaking Gibberish Intermittently

Similar Threads

Similar Threads

Similar Threads