rare-sapphire•7mo ago
[Need Help] Custom LLM. Multiple Assistant messages in a row.
I'm using https://github.com/VapiAI/advanced-concepts-custom-llm, but with PydanticAI and Groq, but I don't think that matters, I can make simple chat work with these, what I can't do is this:
- Assistant says its initial message, defined in UI
- User explains his problem
- I want assistant to say "One moment" as there is agentic workflow to craft the answer and it might take few seconds 
- I want agentic AI to respond with crafted answer about users problem
-- This may include further questions for user, so loop continues.
I can do one or the other ("one minute please" or "agentic resposne", but not both, I guess once I send one of the responses back to vapi, vapi expects user input.
How do I send 2 or more assistant messages and have them spoken to user and then wait for user's answer
12 Replies
Hi there! 👋 
I understand you're trying to have your assistant say "One moment" before delivering an agentic response, without requiring user input in between. 
I can help with this! Based on the information I found, there are several approaches you can try:
1. Using the 
triggerResponseEnabled Flag
You can use the triggerResponseEnabled flag to control whether a message triggers an LLM response:
2. Using the .send() Method
During a call, you can send intermediate messages using the .send() method:
3. Sending Silent System Messages Server-Side
According to information I found, you can "send silent system messages server side" using a "control url" to inject messages from the backend.
4. Using Webhooks as a Workaround
Some users have reported success using webhooks instead of the Vapi module to send multiple messages.
5. Building Workflows with Multiple "Say" Tasks
If you're using Vapi's workflow builder, you can add multiple "say" tasks to send multiple messages in sequence according to the Vapi workflows documentation.
6. Adjusting Conversation Flow in Your ScriptYou can modify your script to include "additional prompts or logic that triggers the next part of the conversation automatically, without waiting for a response" as mentioned in this discussion.
Would you like me to elaborate on any of these approaches?
Sources:
- Vapi Web SDK Documentation
- Discord Discussion on Turn-Taking Behavior
- Discord Discussion on Silent System Messages
- Discord Discussion on Webhooks
- Vapi Workflows Documentation
⤵️  Mention @Vapi for follow-ups 🐥
rare-sapphireOP•7mo ago
ratty-blush•7mo ago
@Pigeon let me check, what can be done.
1. Use streaming responses in your custom-llm webhook:
   - Immediately stream "One moment" as the first part of your response
   - Keep the connection open (don't send the stop signal)
   - Continue processing your agentic workflow
   - Stream the full response when ready
   - Finally send the stop signal
2. Implementation would look like:
   - Return a streaming response using OpenAI's format
   - First chunk contains "One moment"
   - Subsequent chunks contain your full response
   - This creates the appearance of a continuous assistant message
3. Benefits:
   - No need to track conversation state
   - No complex auto-triggering mechanism
   - Works naturally with Vapi's existing architecture
   - Appears as a single assistant turn from Vapi's perspective
This approach leverages the streaming nature of LLM responses, which Vapi is already designed to handle. Your custom-llm endpoint just needs to properly implement the OpenAI-compatible streaming format.
ratty-blush•7mo ago
@Pigeon Let me know if you require further help, happy to contribue.
rare-sapphireOP•7mo ago
Si I've sent custom message like:
content_only = f"data:{json.dumps(first_chunk.model_dump())}\n\n" 
Which doesn't have any end markers.
Then the LLM workflow generated other message with stop and [DONE]
Result: After initial vapi message long wait as before and "one minute" + LLM message at the same time. So no "One moment" straight away and and LLM message after few seconds. Everything is ttsed after few seconds. Probably when [DONE] and stop is sent
If I send custom message like:
content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: [DONE]\n\n"
Which only sends [DONE] but not stop
Nothing initially, after LLM runs, I only hear "One moment"
And if I send custom message full stop and [DONE] 
I get "One moment" only after LLM  workflow as well.
I don't think VAPI start TTS without [END] and/or stop?
Can you confirm, once VAPI starts getting SSE stream, it sends received chunks straight to tts or does it only send after receiving [DONE] and/or stop
Maybe im sending "one moment" in a wrong way?Hey, apologies for the delay. Could you share the call ID for the mentioned issues so I could take a look at what was the chnk stream sent out to us and what was the action associated with it?
chnk* -> chunk
rare-sapphireOP•7mo ago
Hey, yep:
1. Sending 
"One moment please." chunk before running LLM workflow that sends it's own chunk after, no interference between I think":
CallID71b7e0d2-6fb8-4380-b20c-b43b7e9637d8
In my terminal I see:
From code:
So it created:
And is sent/streamed like this:
So no [DONE]
No final chunk with stop
Al of this is called before running LLM workflow that takes 3-5 seconds.
Then LLM workflow generates response and it is sent like this:
So LLM response is sent with final chunk that has [DONE] and finish_reason="stop"
Result: I hear "One moment please." AFTER LLM response is sent together with LLM response.
I'm could be doing something wrong, It would be amazing if this worked, I'd have everything I need to build AI agents for huge logistics company with VAPI
It kinda seems that VAPI doesnt do TTS without getting [DONE] or something, but I might be wrong
Would be nice to have some [INTERUPT] marker or something in that caseWhen sending responses from your Custom LLM, insert the flush tag at points where you want text to be immediately processed by TTS:
This will cause "Hello, I'm your virtual assistant." to be immediately sent to TTS, followed by "Let me check that information for you." as soon as it's generated, without waiting for the rest of the response.
When VAPI detects this tag, it immediately sends all text before the tag to TTS.
Give it a try and let me know how it goes for you.
This will cause "Hello, I'm your virtual assistant." to be immediately sent to TTS, followed by "Let me check that information for you." as soon as it's generated, without waiting for the rest of the response.
When VAPI detects this tag, it immediately sends all text before the tag to TTS.
Give it a try and let me know how it goes for you.
rare-sapphireOP•7mo ago
Yes, its a lot better, not as soon as I'd like, but I guess that's mostly network/TTS latency, Thanks a lot, @Shubham Bajaj !!!
Marking this ticket as Solved ✅