
How to pick between the Mistral LLMs for your next build? Well, two options are already Vapi-native: Mistral Large and Mistral Small. Mistral Medium 3 just dropped, and you can use it to power your Vapi voice agent already. You need to bring it yourself, but it's worth it:
This guide walks through everything needed to build a Mistral Medium voice agent using Vapi's BYOM platform.
» Compare a Mistral Small and a Mistral Large custom agent first.
Mistral Medium 3, launched in May 2025, embodies Mistral AI's "medium is the new large" philosophy. It basically means great performance without bank-breaking pricing.
Mistral Small and Large have been around a bit longer – March 2025 and July 2024 – eons in the rapidly advancing voice AI space. Nevertheless, you’re probably wondering why Medium should be considered over the other two. Well, here’s a handy table illustrating the top line differences:
| Model | Availability in Vapi | Pricing (per 1M tokens) | Best Use Cases | Context Window |
|---|---|---|---|---|
| Mistral Small | Native dropdown | Bundled in Vapi pricing | Simple conversations, FAQ bots | 32K tokens |
| Mistral Medium 3 | BYOM required | $0.40 input / $2.00 output | Complex reasoning, technical support | 128K tokens |
| Mistral Large | Native dropdown | Bundled in Vapi pricing | Enterprise applications, specialized domains | 128K tokens |
With Medium 3, you’ll note the same context window as Large (hence the Medium is the new Large positioning from Mistral). Compared to Small, you get a far superior context window – but that’s only valuable if your conversations need more complex reasoning. Simple conversations and directories would be fine on Small.
Remember, too, opting for Medium means BYOM for the time being. If you’re just looking for a quick build, sticking with Large will be the least disruptive approach.
Mistral Medium 3 achieves impressive benchmark scores that translate directly to voice agent capabilities:
No doubt, Medium 3’s performance standard is on par with the top Anthropic and OpenAI competition. Your Mistral Medium voice agent will handle the technical support calls, complex problem-solving, and document discussion challenges as smoothly as the household names (both of which are native to Vapi already).
Compared to Small and Large, a few months is lightyears in terms of performance development - you’re going to see better results with Medium 3.
Several characteristics make Medium 3 particularly well-suited for conversational AI applications, making it worth the BYOM approach.
Choosing Medium 3 instead of the other native LLM options does make life slightly harder for your voice agent build:
Given the strengths in context and performance, Medium 3-powered digital voice assistants should excel at:
» Test a HSA Management voice assistant with Mistral Large.
Vapi's BYOM platform connects Mistral's API directly to Vapi's voice processing pipeline: Speech-to-Text → LLM Processing → Text-to-Speech.
Vapi handles audio processing and orchestration while your Mistral Medium 3 endpoint provides the conversational intelligence.
Prerequisites: You'll need an active Mistral API key with billing enabled, a Vapi account with BYOM access, and the model endpoint URL.
Configuration Example:
const vapiAgent = {
model: {
provider: "custom-llm",
url: "https://api.mistral.ai/v1/chat/completions",
model: "mistral-medium-3",
apiKey: process.env.MISTRAL_API_KEY,
// Voice-optimized settings
maxTokens: 150, // Concise responses for voice
temperature: 0.7, // Balanced creativity/consistencystream: true, // Enable real-time streaming
contextLength: 4000, // Reasonable for voice conversations
systemMessage: "You are a helpful voice assistant. Keep responses conversational and under 30 seconds when spoken aloud."
},
// Essential fallback configuration
fallbackModel: {
provider: "openai",
model: "gpt-4"
}
}
Voice-Specific Optimization: Unlike text applications, your Mistral Medium voice agent requires 150-token response limits for concise answers, with streaming enabled to achieve sub-300ms perceived latency. The context is limited to 4000 tokens, despite the 128K capability, to maintain response speed.
Production Essentials: Monitor response times (<300ms target), error rates (<1%), and token consumption for cost control. Configure reliable fallback models since voice applications require 99% uptime, and store API keys securely with regular rotation. For regulated industries, consider hybrid deployment options to meet data residency requirements.
Building a Mistral Medium voice agent with Vapi's BYOM platform delivers enterprise-grade conversational AI at a fraction of traditional costs. Sure, it’s a little extra work, but we wouldn’t blame you if you think it’s worth it.
The combination of Mistral Medium 3's 90% premium model performance, 128K context window, and multimodal capabilities creates sophisticated voice experiences while maintaining cost efficiency at scale.
In your first week, secure the Mistral API access and confirm BYOM availability in your Vapi account. Then, configure your first assistant with conservative token limits for testing. Run some basic conversation tests to validate integration and response quality.
Then, start scaling. Deploy monitoring for latency, cost, and quality metrics. Optimize your prompts and parameters based on real feedback, and start migrating high-value use cases.
» Ready to get started? Build your voice agentbacked by Mistral Medium 3.
\