Clarification on Prompt Tokens Estimate for Voice Agent

Hi Vapi Team,

I’m working on estimating the pricing for a voice agent and would appreciate some clarification on how the calculator works.

Use Case:

A customer service voice agent that answers general queries from customers for a business.

It may also collect details (e.g., name, phone number, booking info) to create reservations.

Expected Usage:

~ 5,000 calls per month

~ 4 minutes per call

~ 20,000 minutes total per month

My questions are mainly about the “Prompt Tokens” field in the calculator:

  • What exactly do you mean by “Prompt Tokens” in the context of Vapi? Are they the same as OpenAI/GPT tokens, or are they calculated differently for voice conversations?
  • Should the value entered in the calculator be per call/session or the total prompt tokens for all calls (e.g., across 20,000 minutes)?
  • How do prompt tokens typically scale—are they mainly affected by the system prompt/context I provide to the agent, or do they also increase as the conversation goes on?
  • For multilingual agents (e.g., English + Arabic), would token usage or pricing differ compared to a single-language agent?
  • Based on my use case (5,000 calls × 4 minutes each ≈ 20,000 minutes/month), what would you recommend I enter in the “Prompt Tokens” field to get a realistic estimate?
  • Getting clarity on this will help me enter the right values in the calculator and make a more accurate cost projection for my setup and decide if going with Vapi is the right choice. I'm thinking of going with the enterprise plan since our yearly calls exceed the 200,000 minute mark.
Thanks a lot for your guidance, looking forward to hearing from you!
Vapi_Prompt_Tokens.png
Was this page helpful?