rival-black
rival-black2mo ago

Why do fast GPT models show high latency in Vapi assistant AI model listing?

For example, for "GPT o4 Mini Cluster" is shows 1640ms latency, but its quite faster according to OpenAI. Wanted to understand how are you using these models that makes their latency this high. And, if theres anything we can do on our end to make the latencies lower (by provider our own Clusters, API keys, etc).
12 Replies
Vapi Support Bot
To reduce latency, you can bring your own API key and set up a custom LLM endpoint for Vapi. Here’s a quick example using Python and Flask:
from flask import Flask, request, jsonify
import openai

app = Flask(__name__)
openai.api_key = "YOUR_OPENAI_API_KEY"

@app.route("/chat/completions", methods=["POST"])
def chat_completions():
data = request.get_json()
response = openai.ChatCompletion.create(
model="gpt-4.1-mini",
messages=data["messages"]
)
return jsonify({"text": response.choices[0].message["content"]})

if __name__ == "__main__":
app.run(port=5000)
from flask import Flask, request, jsonify
import openai

app = Flask(__name__)
openai.api_key = "YOUR_OPENAI_API_KEY"

@app.route("/chat/completions", methods=["POST"])
def chat_completions():
data = request.get_json()
response = openai.ChatCompletion.create(
model="gpt-4.1-mini",
messages=data["messages"]
)
return jsonify({"text": response.choices[0].message["content"]})

if __name__ == "__main__":
app.run(port=5000)
Expose this server with ngrok and add the public URL in Vapi’s dashboard under Custom LLM. This setup lets you control the connection and may lower latency compared to shared clusters.
See full guide Source: - Connecting Your Custom LLM to Vapi: A Comprehensive Guide
⤵️ Mention @Vapi Support Bot for follow-ups 🐥
rival-black
rival-blackOP2mo ago
If I provide my own OpenAI API keys, then on selecting "GPT o4 Mini Cluster", will it send requests to my OpenAI key? If yes, then why does it still shows the same latency in the listing? @Vapi Support Bot If I provide my own OpenAI API keys, then on selecting "GPT o4 Mini Cluster", will it send requests to my OpenAI key? If yes, then why does it still shows the same latency in the listing?
Vapi Support Bot
Thank you for your question! To clarify, are you adding your OpenAI API key under "Provider Keys" and selecting the default "GPT o4 Mini Cluster" from the model list, or are you configuring a "Custom LLM" endpoint as described in the documentation? This will help determine how requests are routed and why the latency display may not update. Sources: - Connecting Your Custom LLM to Vapi: A Comprehensive Guide - Provider Keys
rival-black
rival-blackOP2mo ago
@Vapi Support Bot i'm just adding the openAi api key in the "provider keys"
Vapi Support Bot
When you add your OpenAI API key under Provider Keys and select "GPT o4 Mini Cluster," Vapi will use your key for requests, and you will be billed directly by OpenAI. However, the latency shown in the model listing is based on Vapi's shared cluster metrics and does not update dynamically to reflect your individual key or connection—it remains the same for all users. If you want more control over latency, consider setting up a Custom LLM endpoint as described in the documentation.
See Provider Keys documentation Source: - Provider Keys
Duckie
Duckie2mo ago
Message marked as helpful by @LordGrim! 🎉
rival-black
rival-blackOP4w ago
@Vapi Support Bot is there a way to send the "metadata" fields of the assisstant to the OpenAI (via the API Keys provided)?
Vapi
Vapi4w ago
No description
rival-black
rival-blackOP4w ago
@Vapi Support Bot is there a way to send the "metadata" fields of the assisstant to the OpenAI (via the API Keys provided)?
Sahil
Sahil4w ago
Could you provide some more detail about passing the metadata over?
rival-black
rival-blackOP4w ago
I want to pass the metadata (eg: assisstantId, callId) from Vapi to OpenAI (my own API key). so that I can filter on these fields and log the pricing per call in OpenAI dashboard.
Sahil
Sahil4w ago
If you are using your API key, the API logs on openAI should have the costs for each usage, unless they have some webhook that you can configure to send that data to your account. Otherwise, there may not be a way to send that metadata to openAI directly.

Did you find this page helpful?