dry-scarletD
VAPI•6mo ago
dry-scarlet

Use gemini-2.0-flash for VAPI custom LLM

💡 Question: I’m trying to integrate gemini-2.0-flash as a streaming model with Vapi, but I’m hitting a blocker.
Vapi’s custom LLM example uses OpenAI’s ChatCompletionChunk for StreamingResponse, but Gemini streams using google.genai.types.GenerateContentResponse.

To work around this, I hacked the OpenAI proto to repurpose delta.content with Gemini’s output, but that feels hacky and wrong.

Does anyone have a clean example or guidance on how to properly stream gemini-2.0-flash as a StreamingResponse in a custom LLM using Vapi?
Was this page helpful?