rare-sapphire•7mo ago

[Need Help] Custom LLM. Multiple Assistant messages in a row.

I'm using https://github.com/VapiAI/advanced-concepts-custom-llm, but with PydanticAI and Groq, but I don't think that matters, I can make simple chat work with these, what I can't do is this: - Assistant says its initial message, defined in UI - User explains his problem - I want assistant to say "One moment" as there is agentic workflow to craft the answer and it might take few seconds - I want agentic AI to respond with crafted answer about users problem -- This may include further questions for user, so loop continues. I can do one or the other ("one minute please" or "agentic resposne", but not both, I guess once I send one of the responses back to vapi, vapi expects user input. How do I send 2 or more assistant messages and have them spoken to user and then wait for user's answer

12 Replies

Vapi Support Bot•7mo ago

Hi there! 👋 I understand you're trying to have your assistant say "One moment" before delivering an agentic response, without requiring user input in between. I can help with this! Based on the information I found, there are several approaches you can try: 1. Using the triggerResponseEnabled Flag You can use the triggerResponseEnabled flag to control whether a message triggers an LLM response:

vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "One moment, I'm thinking...",
  },
  triggerResponseEnabled: false // This prevents waiting for user input
});

// Then after your agentic processing is complete:
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "Here's your answer...",
  }
});

vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "One moment, I'm thinking...",
  },
  triggerResponseEnabled: false // This prevents waiting for user input
});

// Then after your agentic processing is complete:
vapi.send({
  type: "add-message",
  message: {
    role: "assistant",
    content: "Here's your answer...",
  }
});

2. Using the .send() Method During a call, you can send intermediate messages using the .send() method:

vapi.send({
  type: "add-message",
  message: {
    role: "system",
    content: "The assistant should say 'One moment' and then continue without waiting for user input",
  },
});

vapi.send({
  type: "add-message",
  message: {
    role: "system",
    content: "The assistant should say 'One moment' and then continue without waiting for user input",
  },
});

3. Sending Silent System Messages Server-Side According to information I found, you can "send silent system messages server side" using a "control url" to inject messages from the backend. 4. Using Webhooks as a Workaround Some users have reported success using webhooks instead of the Vapi module to send multiple messages. 5. Building Workflows with Multiple "Say" Tasks If you're using Vapi's workflow builder, you can add multiple "say" tasks to send multiple messages in sequence according to the Vapi workflows documentation. 6. Adjusting Conversation Flow in Your Script

Vapi Support Bot•7mo ago

You can modify your script to include "additional prompts or logic that triggers the next part of the conversation automatically, without waiting for a response" as mentioned in this discussion. Would you like me to elaborate on any of these approaches? Sources: - Vapi Web SDK Documentation - Discord Discussion on Turn-Taking Behavior - Discord Discussion on Silent System Messages - Discord Discussion on Webhooks - Vapi Workflows Documentation

⤵️ Mention @Vapi for follow-ups 🐥

rare-sapphireOP•7mo ago

I want it done from server side with https://github.com/VapiAI/advanced-concepts-custom-llm bump

ratty-blush•7mo ago

@Pigeon let me check, what can be done.

Kyle•7mo ago

1. Use streaming responses in your custom-llm webhook: - Immediately stream "One moment" as the first part of your response - Keep the connection open (don't send the stop signal) - Continue processing your agentic workflow - Stream the full response when ready - Finally send the stop signal 2. Implementation would look like: - Return a streaming response using OpenAI's format - First chunk contains "One moment" - Subsequent chunks contain your full response - This creates the appearance of a continuous assistant message 3. Benefits: - No need to track conversation state - No complex auto-triggering mechanism - Works naturally with Vapi's existing architecture - Appears as a single assistant turn from Vapi's perspective This approach leverages the streaming nature of LLM responses, which Vapi is already designed to handle. Your custom-llm endpoint just needs to properly implement the OpenAI-compatible streaming format.

ratty-blush•7mo ago

@Pigeon Let me know if you require further help, happy to contribue.

rare-sapphireOP•7mo ago

Si I've sent custom message like: content_only = f"data:{json.dumps(first_chunk.model_dump())}\n\n" Which doesn't have any end markers. Then the LLM workflow generated other message with stop and [DONE] Result: After initial vapi message long wait as before and "one minute" + LLM message at the same time. So no "One moment" straight away and and LLM message after few seconds. Everything is ttsed after few seconds. Probably when [DONE] and stop is sent If I send custom message like: content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: [DONE]\n\n" Which only sends [DONE] but not stop Nothing initially, after LLM runs, I only hear "One moment" And if I send custom message full stop and [DONE] I get "One moment" only after LLM workflow as well. I don't think VAPI start TTS without [END] and/or stop? Can you confirm, once VAPI starts getting SSE stream, it sends received chunks straight to tts or does it only send after receiving [DONE] and/or stop Maybe im sending "one moment" in a wrong way?

Kyle•7mo ago

Hey, apologies for the delay. Could you share the call ID for the mentioned issues so I could take a look at what was the chnk stream sent out to us and what was the action associated with it? chnk* -> chunk

rare-sapphireOP•7mo ago

Hey, yep: 1. Sending "One moment please." chunk before running LLM workflow that sends it's own chunk after, no interference between I think": CallID71b7e0d2-6fb8-4380-b20c-b43b7e9637d8 In my terminal I see:

-----------Message chunk:
data: {"id": "x", "object": "chat.completion.chunk", "created": x, "model": "llama-3.3-70b-versatile", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "One moment please."}, "finish_reason": null}]}

-----------Message chunk:
data: {"id": "x", "object": "chat.completion.chunk", "created": x, "model": "llama-3.3-70b-versatile", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "One moment please."}, "finish_reason": null}]}

From code:

        async for chunk in self._send_message(
            "One moment please.",
            completion_id,
            vapi_payload,
            send_stop=False,
            send_end_only=False,
        ):
            print("-----------Message chunk:")
            print(chunk)
            yield chunk

        async for chunk in self._send_message(
            "One moment please.",
            completion_id,
            vapi_payload,
            send_stop=False,
            send_end_only=False,
        ):
            print("-----------Message chunk:")
            print(chunk)
            yield chunk

So it created:

    first_chunk = ChatCompletionChunk(
        id=completion_id,
        model=model_name,
        choices=[Choice(delta=DeltaContent(role="assistant", content=message))],
    )

    first_chunk = ChatCompletionChunk(
        id=completion_id,
        model=model_name,
        choices=[Choice(delta=DeltaContent(role="assistant", content=message))],
    )

And is sent/streamed like this:

        content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\n"
        yield content_only

        content_only = f"data: {json.dumps(first_chunk.model_dump())}\n\n"
        yield content_only

So no [DONE] No final chunk with stop Al of this is called before running LLM workflow that takes 3-5 seconds. Then LLM workflow generates response and it is sent like this:

        final_chunk = ChatCompletionChunk(
            id=completion_id,
            model=model_name,
            choices=[Choice(delta=DeltaContent(), finish_reason="stop")],
        )

        combined = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: {json.dumps(final_chunk.model_dump())}\n\ndata: [DONE]\n\n"
        yield combined

        final_chunk = ChatCompletionChunk(
            id=completion_id,
            model=model_name,
            choices=[Choice(delta=DeltaContent(), finish_reason="stop")],
        )

        combined = f"data: {json.dumps(first_chunk.model_dump())}\n\ndata: {json.dumps(final_chunk.model_dump())}\n\ndata: [DONE]\n\n"
        yield combined

So LLM response is sent with final chunk that has [DONE] and finish_reason="stop" Result: I hear "One moment please." AFTER LLM response is sent together with LLM response. I'm could be doing something wrong, It would be amazing if this worked, I'd have everything I need to build AI agents for huge logistics company with VAPI It kinda seems that VAPI doesnt do TTS without getting [DONE] or something, but I might be wrong Would be nice to have some [INTERUPT] marker or something in that case

Kyle•7mo ago

When sending responses from your Custom LLM, insert the flush tag at points where you want text to be immediately processed by TTS:

Hello, I'm your virtual assistant.<flush />
Let me check that information for you.<flush />
I've found what you're looking for.

Hello, I'm your virtual assistant.<flush />
Let me check that information for you.<flush />
I've found what you're looking for.

This will cause "Hello, I'm your virtual assistant." to be immediately sent to TTS, followed by "Let me check that information for you." as soon as it's generated, without waiting for the rest of the response.

When VAPI detects this tag, it immediately sends all text before the tag to TTS.

Give it a try and let me know how it goes for you.

rare-sapphireOP•7mo ago

Yes, its a lot better, not as soon as I'd like, but I guess that's mostly network/TTS latency, Thanks a lot, @Shubham Bajaj !!!

Kyle•7mo ago

Marking this ticket as Solved ✅

[Need Help] Custom LLM. Multiple Assistant messages in a row.

Did you find this page helpful?