Urgent! LLM Fallback/Latency Issue
Hello,I am experiencing a critical LLM latency issue that appears to be caused by an unwanted automatic model fallback in the Vapi infrastructure.
My Assistant is currently configured to use Llama 3 70B with the Groq provider (or previously GPT-4o Mini Cluster). However, for multiple calls, the system is consistently falling back to Gemini-2.5-flash (Google provider).
This is resulting in unacceptably high LLM latency, frequently ranging around 3-4sec, making the real-time conversation unusable.
Evidence:
Configured Model: Llama 3 70B (Groq)
Observed Fallback (Log Detail): The latest call log shows the Google model being used instead of the configured one.
Call ID: 019a0b78-e67c-7118-8a9f-f0a9d26b60a4
Log Data: The log for this call explicitly shows "provider": "google" and "model": "gemini-2.5-flash".Could you please investigate why the system is failing to route to my configured Groq model and why it is instead defaulting to a high-latency Google model?Thank you for your urgent attention to this issue.
My Assistant is currently configured to use Llama 3 70B with the Groq provider (or previously GPT-4o Mini Cluster). However, for multiple calls, the system is consistently falling back to Gemini-2.5-flash (Google provider).
This is resulting in unacceptably high LLM latency, frequently ranging around 3-4sec, making the real-time conversation unusable.
Evidence:
Configured Model: Llama 3 70B (Groq)
Observed Fallback (Log Detail): The latest call log shows the Google model being used instead of the configured one.
Call ID: 019a0b78-e67c-7118-8a9f-f0a9d26b60a4
Log Data: The log for this call explicitly shows "provider": "google" and "model": "gemini-2.5-flash".Could you please investigate why the system is failing to route to my configured Groq model and why it is instead defaulting to a high-latency Google model?Thank you for your urgent attention to this issue.