metropolitan-bronze
metropolitan-bronze5d ago

Very slow speaking response times

Hey VAPI folks - i am trying to figure out settings that produce natural feeling conversation times. Right now I talk and it can take 2-3 seconds before the AI responds. It's really slow. am using claude-sonnet-4 for my production completions model but even when i use Groq gpt-oss-120B it's not much better. Is it the SST provider I'm using (Speechmatics)? Is it cartesia? Is there some observability where i can see for each turn what is taking the most time? Here's my current settings. Any help would be appreciated, may end up spending a lot with you guys so happy to hop on a call..... :
// Transient assistant configuration defaults
export const ASSISTANT_CONFIG: Partial<Vapi.CreateAssistantDto> = {
name: 'AI Coach',

model: {
provider: PRODUCTION_MODEL.provider,
model: PRODUCTION_MODEL.model,
maxTokens: 1200,
temperature: PRODUCTION_MODEL.temperature,
...(PRODUCTION_MODEL.provider === 'custom-llm'
? { url: PRODUCTION_MODEL.url, metadataSendMode: 'off' }
: {}),
} as any,

voice: {
provider: 'cartesia',
voiceId: 'b7d50908-b17c-442d-ad8d-810c63997ed9', // Professional female voice
},
// OPTIMIZED: Much more responsive interruption handling
startSpeakingPlan: {
waitSeconds: 0.05, // REDUCED from 0.12 - start speaking faster
transcriptionEndpointingPlan: {
onPunctuationSeconds: 0.03, // REDUCED from 0.05
onNoPunctuationSeconds: 0.25, // REDUCED from 0.38
onNumberSeconds: 0.2, // REDUCED from 0.3
},
smartEndpointingPlan: {
provider: 'livekit',
waitFunction: '200 + 800 * max(0, x - 0.3)', // FASTER response curve
},
},
// OPTIMIZED: More sensitive interruption detection
stopSpeakingPlan: {
numWords: 1, // REDUCED from 2 - interrupt faster
voiceSeconds: 0.12, // REDUCED from 0.18 - more sensitive
backoffSeconds: 0.4, // REDUCED from 0.6 - shorter pause after interruption
},
backgroundSpeechDenoisingPlan: {
smartDenoisingPlan: {
enabled: true,
},
// SIMPLIFIED: Reduce processing overhead
fourierDenoisingPlan: {
enabled: false, // DISABLED - can add latency
},
},

messagePlan: {
idleMessages: ['Are you still there?', "I'm here whenever you're ready to continue."],
idleTimeoutSeconds: 8,
idleMessageMaxSpokenCount: 3,
idleMessageResetCountOnUserSpeechEnabled: true,
},
transcriber: {
provider: 'speechmatics',
model: 'default', // Use 'enhanced' for better accuracy or 'standard' for cost efficiency
language: 'en',
fallbackPlan: {
transcribers: [
{
provider: 'assembly-ai',
language: 'en',
},
],
},
},

silenceTimeoutSeconds: 20,
maxDurationSeconds: 600, // 10 minutes max
// Subscribe to important events only (exclude speech-update to reduce spam)
serverMessages: ['end-of-call-report', 'hang'],
backgroundSound: 'off',
backgroundDenoisingEnabled: true, // Enable background noise reduction
};
// Transient assistant configuration defaults
export const ASSISTANT_CONFIG: Partial<Vapi.CreateAssistantDto> = {
name: 'AI Coach',

model: {
provider: PRODUCTION_MODEL.provider,
model: PRODUCTION_MODEL.model,
maxTokens: 1200,
temperature: PRODUCTION_MODEL.temperature,
...(PRODUCTION_MODEL.provider === 'custom-llm'
? { url: PRODUCTION_MODEL.url, metadataSendMode: 'off' }
: {}),
} as any,

voice: {
provider: 'cartesia',
voiceId: 'b7d50908-b17c-442d-ad8d-810c63997ed9', // Professional female voice
},
// OPTIMIZED: Much more responsive interruption handling
startSpeakingPlan: {
waitSeconds: 0.05, // REDUCED from 0.12 - start speaking faster
transcriptionEndpointingPlan: {
onPunctuationSeconds: 0.03, // REDUCED from 0.05
onNoPunctuationSeconds: 0.25, // REDUCED from 0.38
onNumberSeconds: 0.2, // REDUCED from 0.3
},
smartEndpointingPlan: {
provider: 'livekit',
waitFunction: '200 + 800 * max(0, x - 0.3)', // FASTER response curve
},
},
// OPTIMIZED: More sensitive interruption detection
stopSpeakingPlan: {
numWords: 1, // REDUCED from 2 - interrupt faster
voiceSeconds: 0.12, // REDUCED from 0.18 - more sensitive
backoffSeconds: 0.4, // REDUCED from 0.6 - shorter pause after interruption
},
backgroundSpeechDenoisingPlan: {
smartDenoisingPlan: {
enabled: true,
},
// SIMPLIFIED: Reduce processing overhead
fourierDenoisingPlan: {
enabled: false, // DISABLED - can add latency
},
},

messagePlan: {
idleMessages: ['Are you still there?', "I'm here whenever you're ready to continue."],
idleTimeoutSeconds: 8,
idleMessageMaxSpokenCount: 3,
idleMessageResetCountOnUserSpeechEnabled: true,
},
transcriber: {
provider: 'speechmatics',
model: 'default', // Use 'enhanced' for better accuracy or 'standard' for cost efficiency
language: 'en',
fallbackPlan: {
transcribers: [
{
provider: 'assembly-ai',
language: 'en',
},
],
},
},

silenceTimeoutSeconds: 20,
maxDurationSeconds: 600, // 10 minutes max
// Subscribe to important events only (exclude speech-update to reduce spam)
serverMessages: ['end-of-call-report', 'hang'],
backgroundSound: 'off',
backgroundDenoisingEnabled: true, // Enable background noise reduction
};
5 Replies
Shubham Bajaj
Shubham Bajaj5d ago
Hi there, Thank you for your message. Our team is currently out of the office. We operate Monday through Friday, from 9:00 AM to 8:00 PM Pacific Standard Time (PST). We’ll get back to you as soon as possible during our normal business hours. If your message is urgent, please mark it accordingly or include “URGENT” in the subject line, and we’ll do our best to respond promptly. Warm regards,
Vapi
Customer Support Team
metropolitan-bronze
metropolitan-bronzeOP5d ago
Okay i see call logs does hav some of this. It seems like endpointing is the big one. Also claude sonnet 4 not great. Does VAPI have some like common presets of good combinations with different models? Do i just nuke smart endopinting altogether? How is anyone possibly using it when it apepars to add 1+ second to each response does vapi support gemini-2.5-flash-preview-native-audio-dialog ? and whe i'm using gpt-4o-realtime am i able to customize the voice ? man even with gpt-4o-realtime. it's so bad. it talks over me all the time or like double streams. do you guys jut have a "good" preset to use with different models? Trying to twist every little knob just to get it to output something decent because so far pretty every variation i've tried is pretty much unusable Is there someone i can hop on a call with? Building a consumer app here that has potential to scale up quite large, I would like some help getting the realtime voice settings usable.
foreign-sapphire
foreign-sapphire4d ago
Man, I feel your frustrations. Could be crazy a little, I suggest you could reduce the voice response time in the web Dashboard interface of the assistant, under Advanced settings.
metropolitan-bronze
metropolitan-bronzeOP3d ago
bueller
Shubham Bajaj
Shubham Bajaj3d ago
Hey Franny, we understand your frustration with configuring a natural sounding assistant. Our default settings are usually as follows: - Voice: VAPI - LLM: gpt-4o mini cluster - transcriber: deepgram nova-2 This setup, despite being simple, is quite effective and natural sounding. Let us know if you have any questions. Also for available LLM choices that are selectable in our platform, please visit the documentation API page for enum values for each model/provider. https://docs.vapi.ai/api-reference/assistants/create#request.body.model

Did you find this page helpful?