Custom LLM getting partial messages, with significantly delayed full messages
Hey all. We're having an issue with our chat/completions API (custom LLM).
We've been relying on the messages sent as part of the chat/completions payload instead of managing state ourselves, because it seems that the messages get rewritten. We're getting an issue when the user's response is very short, like a one-word "yes".
When the response is very short, it seems that the assistant response in the message payload is somehow cut off. For example, we'll get a call with:
assistant: "Let's begin our interview. Are you qualified to work?"
user: "Yes"
The VAPI messages payload would cut off the second part, and just send us "Let's begin our interview." Then weirdly, some 5 seconds later, we'll get another chat/completions payload with the full "Let's begin our interview. Are you qualified to work?" from the assistant.
This consistently repros when the user response is short. When the user response is longer, we don't have this issue.
We're not keeping state on our end because it seems we get opportunistic completion endpoint calls, and VAPI is discarding opportunistic calls that don't actually get sent to the user. So we have to rely on the VAPI messages for canonical messages - but in this case, if VAPI is sending truncated messages, we have a problem.
It's kind of a weird scenario and hoping someone can help. Thanks!
We've been relying on the messages sent as part of the chat/completions payload instead of managing state ourselves, because it seems that the messages get rewritten. We're getting an issue when the user's response is very short, like a one-word "yes".
When the response is very short, it seems that the assistant response in the message payload is somehow cut off. For example, we'll get a call with:
assistant: "Let's begin our interview. Are you qualified to work?"
user: "Yes"
The VAPI messages payload would cut off the second part, and just send us "Let's begin our interview." Then weirdly, some 5 seconds later, we'll get another chat/completions payload with the full "Let's begin our interview. Are you qualified to work?" from the assistant.
This consistently repros when the user response is short. When the user response is longer, we don't have this issue.
We're not keeping state on our end because it seems we get opportunistic completion endpoint calls, and VAPI is discarding opportunistic calls that don't actually get sent to the user. So we have to rely on the VAPI messages for canonical messages - but in this case, if VAPI is sending truncated messages, we have a problem.
It's kind of a weird scenario and hoping someone can help. Thanks!