VAPI•6mo ago

How to Enable Real-time TTS Chunk Streaming to Caller

Issue Summary

I'm experiencing significant delays in TTS output delivery to callers. My AI agent takes approximately 7 seconds to start responding, with 3-4 seconds being spent on speech generation, despite my custom TTS server being configured for real-time streaming.

Current Behavior

Total response delay: ~7 seconds
TTS generation time: 3-4 seconds of the total delay
Issue: Vapi appears to wait for complete TTS generation before streaming audio to the caller

## Expected Behavior
Real-time streaming of TTS chunks as they're generated, without waiting for complete speech synthesis.

Technical Details

My TTS Server Configuration

Chunk duration: 0.2 seconds per chunk
Chunk interval: Generated every 0.061 seconds
First chunk delay: 1 second
Output format: Streaming chunks

### Problem Description
Despite my TTS server outputting audio chunks in real-time, Vapi seems to collect all chunks before streaming them to the caller, resulting in unnecessary latency.

Reference Data

Conversation ID:

04e2c190-981b-4da1-a5bf-c7e8e63300a8

04e2c190-981b-4da1-a5bf-c7e8e63300a8

04e2c190-981b-4da1-a5bf-c7e8e63300a8

04e2c190-981b-4da1-a5bf-c7e8e63300a8

Please review this conversation to see the timing issues in action.

Questions

How can I configure Vapi to stream TTS chunks immediately as they're received from my custom TTS server?
Is there a configuration setting or parameter that controls this buffering behavior?
Are there any specific requirements for the TTS streaming protocol that I might be missing?

## Environment

Using custom TTS server, configured as described in https://docs.vapi.ai/customization/custom-voices/custom-tts

Any guidance on optimizing this streaming configuration would be greatly appreciated!

Custom TTS integration | Vapi

Learn to integrate your own text-to-speech system with VAPI

progressive-amaranth•7/2/25, 11:21 PM

did u find a fix

extreme-purple•7/3/25, 4:07 AM

Try switch from HTTP to WebSocket transport with pcm_s16le audio format, as HTTP buffers the full response before playback, causing delays..

https://docs.vapi.ai/calls/websocket-transport

WebSocket Transport | Vapi

Stream audio directly via WebSockets for real-time, bidirectional communication

Eextreme-purple Try switch from HTTP to WebSocket transport with pcm_s16le audio format, as HTTP...

uncertain-scarletOP•7/3/25, 1:01 PM

My primary use case is traditional phone calls.

Based on the documentation you referenced, I understand that WebSocket mode only applies to web/app-based clients and doesn't affect traditional phone call functionality.

What are my available options for real-time TTS audio chunk streaming in a phone call scenario?

extreme-purple•7/3/25, 4:39 PM

Use a custom TTS webhook with chunked PCM audio over HTTP for real-time streaming, WebSocket transport is only for web clients, while listenUrl WebSocket allows monitoring but not audio injection..

VapiAPP•7/15/25, 5:51 PM

A gentle reminder to continue this thread.

How to Enable Real-time TTS Chunk Streaming to Caller

Issue Summary

Current Behavior

Technical Details

My TTS Server Configuration

Reference Data

Questions

Similar Threads

Similar Threads

Similar Threads