Custom LLM + tool calls = stream chunks are getting enqueued but not spoken

We have a Custom LLM implementation that VAPI calls over an SSE connection.

We want this sequence:
  1. POST /vapi/llm to start
  2. agent generate sequence
  3. tool call happens, we speak "hang on let me check that for you"
  4. tool call occurs
  5. VAPI speaks response to user
When I test the endpoint locally, I can see the data: {"id": ...} chunks being streamed through fine.

However, on the voice side, VAPI just queues all of the chunks up and speaks them together. Which sorta ruins the point of the waiting message.

How do you get VAPI to speak at designated chunks rather than wait for the entire stream to finish?

Conversely, I've tried using the controlClient to say a response when the tool starts, but VAPI breaks the connection when this happens.
Was this page helpful?