VAPI•2y ago

Custom LLM streaming responses not getting voiced when expected

I'm running one of the NodeJS samples that have been posted in other support threads for how to do streaming, and I'm struggling to get it to actually trigger TTS correctly.

I've confirmed (via curl) that the responses are streaming back when expected, and in fact, in the log you can also see this via the timestamp, but it's inconsistently getting voiced. Sometimes one of the three will happen, but usually at least two or three of the responses won't get voiced until the end of the call - what am I doing wrong?

Call ID: a2098cfd-3f76-435c-b94a-ae8aea6a52e8

03:30:53:917
[CHECKPOINT]
Model request started

03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)

03:31:04:034
[CHECKPOINT]
Model sent start token

03:31:04:035
[LOG]
Model output: Let me think.

03:31:04:035
[CHECKPOINT]
Model sent first output token

03:31:04:037
[LOG]
Voice input: Let me think.

03:31:09:033
[LOG]
Model output: still thinking.

03:31:09:034
[LOG]
Voice input: still thinking.

03:31:13:830
[LOG]
Model output: still thinking.

03:31:13:832
[LOG]
Voice input: still thinking.

03:31:13:834
[CHECKPOINT]
Model sent end token

03:31:14:322
[CHECKPOINT]
11labs: audio received

03:31:14:361
[CHECKPOINT]
Assistant speech started

03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)

03:31:17:269
[CHECKPOINT]
Assistant speech ended

03:30:53:917
[CHECKPOINT]
Model request started

03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)

03:31:04:034
[CHECKPOINT]
Model sent start token

03:31:04:035
[LOG]
Model output: Let me think.

03:31:04:035
[CHECKPOINT]
Model sent first output token

03:31:04:037
[LOG]
Voice input: Let me think.

03:31:09:033
[LOG]
Model output: still thinking.

03:31:09:034
[LOG]
Voice input: still thinking.

03:31:13:830
[LOG]
Model output: still thinking.

03:31:13:832
[LOG]
Voice input: still thinking.

03:31:13:834
[CHECKPOINT]
Model sent end token

03:31:14:322
[CHECKPOINT]
11labs: audio received

03:31:14:361
[CHECKPOINT]
Assistant speech started

03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)

03:31:17:269
[CHECKPOINT]
Assistant speech ended

03:30:53:917
[CHECKPOINT]
Model request started

03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)

03:31:04:034
[CHECKPOINT]
Model sent start token

03:31:04:035
[LOG]
Model output: Let me think.

03:31:04:035
[CHECKPOINT]
Model sent first output token

03:31:04:037
[LOG]
Voice input: Let me think.

03:31:09:033
[LOG]
Model output: still thinking.

03:31:09:034
[LOG]
Voice input: still thinking.

03:31:13:830
[LOG]
Model output: still thinking.

03:31:13:832
[LOG]
Voice input: still thinking.

03:31:13:834
[CHECKPOINT]
Model sent end token

03:31:14:322
[CHECKPOINT]
11labs: audio received

03:31:14:361
[CHECKPOINT]
Assistant speech started

03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)

03:31:17:269
[CHECKPOINT]
Assistant speech ended

03:30:53:917
[CHECKPOINT]
Model request started

03:30:53:918
[LOG]
Model request started (gpt-4o, custom-llm)

03:31:04:034
[CHECKPOINT]
Model sent start token

03:31:04:035
[LOG]
Model output: Let me think.

03:31:04:035
[CHECKPOINT]
Model sent first output token

03:31:04:037
[LOG]
Voice input: Let me think.

03:31:09:033
[LOG]
Model output: still thinking.

03:31:09:034
[LOG]
Voice input: still thinking.

03:31:13:830
[LOG]
Model output: still thinking.

03:31:13:832
[LOG]
Voice input: still thinking.

03:31:13:834
[CHECKPOINT]
Model sent end token

03:31:14:322
[CHECKPOINT]
11labs: audio received

03:31:14:361
[CHECKPOINT]
Assistant speech started

03:31:14:361
[INFO]
Turn Latency: 20446ms (Endpointing 3ms, Model 10118ms, Voice: 10288ms)

03:31:17:269
[CHECKPOINT]
Assistant speech ended

Custom LLM streaming responses not getting voiced when expected

Similar Threads

Similar Threads

Similar Threads