Announcing Vapi Voices Beta: Lower Cost, Lower Latency for High-volume Voice AI

We built Vapi to give you flexibility. Pick the voice that fits your use case, swap providers without rewriting integrations. Optimize for whatever matters most to your application. ElevenLabs, PlayHT, Deepgram, others. The choice is yours.

You launch with a few hundred calls a day. The voice sounds great. Latency is snappy. Your TTS bill is a rounding error. Then you scale to 50,000 calls. Then 200,000.

The math changes fast.

That premium voice you picked because it sounded the most natural? It's now 40% of your variable costs. But here's the thing: not every call needs a premium voice. Your appointment reminder or payment notification can be equally effective without a premium voice experience. Same with routing menus, identity verification, and status updates. High-volume, low-stakes flows where speed and cost matter more than maximum naturalness.

You wanted another option. So we built one.

What’s new

Vapi Voices Beta is a new family of voice models built for high-volume, cost-sensitive use cases. Available now in the voice dropdown alongside every provider you already use.

Cost is $0.0025 per minute. Built for teams where unit economics matter.

Latency is optimized for conversational responsiveness. [P90 benchmark data coming soon.]

Selection works the same way it always has. Pick a Vapi Voice, configure your agent, run it through Evals, and compare against your current setup, zero integration changes.

Multiple voice styles available: professional, calm, neutral. More coming based on what beta users need.

What this enables

Before: You pay the same premium TTS rate for appointment reminders that you pay for complex sales calls. Latency spikes when your provider's infrastructure is under load. You absorb the cost because no alternative fits your integration.

After: You route high-volume, low-stakes calls to a voice optimized for speed and cost. Premium voices stay on premium conversations. Your unit economics improve without changing how you build.

Why we're releasing now

We're releasing this to learn three things:

Which use cases work? Appointment reminders, verification flows, routing menus, and status updates. We have hypotheses. You have production traffic. Your data is better than our guesses.

How latency performs under real load. Internal testing only goes so far. We want to see performance in actual deployments with real conversational patterns.

Whether the savings matter. Does $0.0025/min move the needle for your use case? At what volume?

Your feedback directly shapes the production release. We're explicitly gating GA on what we learn from beta usage.