
When Deepgram released Nova-3, they didn't just iterate on Nova-2; they rebuilt the engine. The results speak for themselves: streaming word error rates dropped from 8.4% to 6.84%, and batch processing hit an impressive 5.26% WER. That's a 54.3% reduction in streaming errors.
The real breakthrough is Nova-3's ability to handle multilingual conversations in real-time. While Nova-2 transcribes different languages well, Nova-3 can seamlessly follow conversations that jump between languages mid-sentence. Both models work through Vapi's platform with identical endpoints, so switching between them is as simple as changing a parameter.
This article breaks down the differences between Deepgram Nova-3 and Nova-2 so you can decide which speech-to-text model to use in your next build.
» Read more about using Deepgram models on Vapi.
| Feature |
| Release Timeline |
| Streaming WER |
| Batch WER |
| Latency |
| Multilingual Support |
| Customization |
| Advanced Features |
| Price Tier |
Nova-2 built its reputation on specialization: different models for meetings, finance calls, phone conversations, and automotive environments. Each one excelled in its lane, but switching contexts meant switching models entirely. It worked, but it was like having a different wrench for every bolt.
Nova-3 flips this approach. Instead of juggling multiple specialized models, it uses a single, more adaptable neural network that adjusts to different contexts on the fly. The technical upgrade centers on improved long-range attention and dynamic contextual adaptation. Now we have a model that remembers longer conversations and adapts more effectively to what it hears.
Practically, you get superior accuracy without the headache of managing multiple model variants. For developers building on Vapi's platform, this means more straightforward integration with better results.
Despite adding significant functionality, Nova-3 maintains the same lightning-fast inference speeds that made Nova-2 reliable for real-time applications. Through more efficient processing and improved parallelization, it delivers reduced latency, expanded concurrency, and improved reliability when handling multiple conversations simultaneously.
This is where Nova-3 truly shines. It's the first TTS AI model that can process multilingual conversations in real-time, something Nova-2 simply can't do. Unlike Gladia, which requires you to pick a language upfront, Nova-3 handles conversations that switch between languages without missing a beat.
The performance data backs this up. Nova-3 consistently outperformed Nova-2 and competing models, such as OpenAI's Whisper, across all tested languages, with a user preference advantage of up to 8-to-1 in some cases. For global applications like customer service, emergency response, and international collaboration, this changes everything.
Nova-2 takes a different path with pre-built variants optimized for specific regions and domains. These work well for focused use cases but lack the dynamic flexibility that makes Nova-3 special.
Nova-3 offers real-time, self-serve customization, allowing you to add up to 100 domain-specific terms without requiring any retraining. Need it to recognize your company's product names or industry jargon? Add them instantly and watch the model adapt.
Beyond customization, Nova-3 includes features that solve real-world problems: enhanced numeric recognition for financial applications, real-time redaction for up to 50 entities to handle privacy compliance, improved timestamp precision for captioning workflows, and better performance in challenging audio environments with background noise or distant microphones.
Nova-2 offers solid pre-built variants for specific domains, but it can't match the flexibility and feature depth of the newer model.
Both models are Vapi-native integrations, built directly into the platform without requiring external API management or authentication setup. You can switch between models through Vapi's interface by simply selecting your preferred Deepgram model in your voice agent configuration.
This native integration eliminates the complexity of managing separate API keys, endpoints, or infrastructure concerns. Both models are immediately available for testing and deployment, making it effortless to compare performance and determine which better suits your specific requirements. The ability to switch between models instantly allows for real-time A/B testing and optimization of your voice applications.
» Speak to a demo AI Voice Agent for Sales Follow-Ups with Nova-3.
Let's talk numbers. Nova-3 operates at Deepgram's premium pricing tier, while Nova-2 remains at standard rates. Since Vapi passes through Deepgram's pricing, you'll pay more for Nova-3's advanced capabilities.
Is the upgrade worth it? That depends on your needs. The enhanced accuracy, multilingual support, and customization features deliver a measurable return on investment (ROI) for businesses where transcription quality directly impacts outcomes. Fewer errors mean less manual correction, better user experiences, and more reliable voice applications.
Nova-2 remains an excellent choice for straightforward projects where budget matters more than cutting-edge features. The model's proven reliability and lower cost make it perfect for basic transcription needs without the premium frills.
Nova-3 consistently outperforms Nova-2 in terms of accuracy, multilingual capabilities, and advanced features: there’s a reason Deepgram upgraded. It's the superior choice for demanding applications where precision and flexibility are most crucial. The premium pricing reflects genuine technological advancement: this isn't just marketing fluff.
Nova-2 remains an excellent option for budget-conscious projects and straightforward transcription needs. Its proven track record and accessible pricing make it perfect for teams prioritizing cost-effectiveness over cutting-edge features.
Now, instead of doing more research, why don’t you start comparing them yourself?
» Start building with Nova-3 and Nova-2 on Vapi.
\