rival-black•2mo ago
Why do fast GPT models show high latency in Vapi assistant AI model listing?
For example, for "GPT o4 Mini Cluster" is shows 1640ms latency, but its quite faster according to OpenAI.
Wanted to understand how are you using these models that makes their latency this high.
And, if theres anything we can do on our end to make the latencies lower (by provider our own Clusters, API keys, etc).
12 Replies
To reduce latency, you can bring your own API key and set up a custom LLM endpoint for Vapi. Here’s a quick example using Python and Flask:
Expose this server with ngrok and add the public URL in Vapi’s dashboard under Custom LLM. This setup lets you control the connection and may lower latency compared to shared clusters.
See full guide Source: - Connecting Your Custom LLM to Vapi: A Comprehensive Guide
See full guide Source: - Connecting Your Custom LLM to Vapi: A Comprehensive Guide
⤵️ Mention @Vapi Support Bot for follow-ups 🐥
rival-blackOP•2mo ago
If I provide my own OpenAI API keys, then on selecting "GPT o4 Mini Cluster", will it send requests to my OpenAI key? If yes, then why does it still shows the same latency in the listing?
@Vapi Support Bot If I provide my own OpenAI API keys, then on selecting "GPT o4 Mini Cluster", will it send requests to my OpenAI key? If yes, then why does it still shows the same latency in the listing?
Thank you for your question! To clarify, are you adding your OpenAI API key under "Provider Keys" and selecting the default "GPT o4 Mini Cluster" from the model list, or are you configuring a "Custom LLM" endpoint as described in the documentation? This will help determine how requests are routed and why the latency display may not update.
Sources:
- Connecting Your Custom LLM to Vapi: A Comprehensive Guide
- Provider Keys
rival-blackOP•2mo ago
@Vapi Support Bot i'm just adding the openAi api key in the "provider keys"
When you add your OpenAI API key under Provider Keys and select "GPT o4 Mini Cluster," Vapi will use your key for requests, and you will be billed directly by OpenAI. However, the latency shown in the model listing is based on Vapi's shared cluster metrics and does not update dynamically to reflect your individual key or connection—it remains the same for all users. If you want more control over latency, consider setting up a Custom LLM endpoint as described in the documentation.
See Provider Keys documentation Source: - Provider Keys
See Provider Keys documentation Source: - Provider Keys
Message marked as helpful by @LordGrim! 🎉
rival-blackOP•4w ago
@Vapi Support Bot is there a way to send the "metadata" fields of the assisstant to the OpenAI (via the API Keys provided)?

rival-blackOP•4w ago
@Vapi Support Bot is there a way to send the "metadata" fields of the assisstant to the OpenAI (via the API Keys provided)?
Could you provide some more detail about passing the metadata over?
rival-blackOP•4w ago
I want to pass the metadata (eg: assisstantId, callId) from Vapi to OpenAI (my own API key). so that I can filter on these fields and log the pricing per call in OpenAI dashboard.
If you are using your API key, the API logs on openAI should have the costs for each usage, unless they have some webhook that you can configure to send that data to your account. Otherwise, there may not be a way to send that metadata to openAI directly.