VAPI•2mo ago

Vapi X Groq NOT using prompt caching is burning $

I have been trying to use Groq with GPT-OSS-120B through the native integration but there is a problem with pricing:

It says that model costs about 0,01$/min but when I see the call logs, it is charging 0,04$ or even 0,05$ per call while GPT-4o mini (with the same pricing) is charging that 0,01$/min.

The issue must be that Vapi is not using Groq prompt caching https://groq.com/blog/introducing-prompt-caching-on-groqcloud or there is something missconfigured.

Please, could you take a look and tell me how to use this open-source models with lowest latency and at the cost it should be?

At the same time, the native integration with Cerebras is not showing this GPT-OSS-120B model, would be interesting to test...

Thanks.

message.txt4.07KB

Groq

Introducing Prompt Caching on GroqCloud

The Groq LPU delivers inference with the speed and cost developers need.

Vapi X Groq NOT using prompt caching is burning $

Similar Threads

Similar Threads

Similar Threads