Also, for my previous company, I've worked with both Bland and VAPI and have had a similar problem with Bland where the agent will just fail to start speaking. On Bland, the issue was they deprecated the voice that I was using. I don't think that's the case here since I've tried a few different open ai voices and none are working. It would be super helpful to have some sort of alarm or failure to notify that this is happening consistently so I know to change the model stack (maybe an "Agent Started Speaking" metric that I can alarm on? Maybe I have to do that myself by checking whether the agent got anything into the transcript).
Also, I'm having the same problem when trying to switch it to the sesame voice. Here is a call ID for that
52ee9bcf-7953-4bad-a807-7af56d8ff00a
The note that I have that could be helpful in the investigation:
About a month ago I was trying to use the realtime API with a VAPI generated number. Whenever I would call that number, the voice sounded really slow and deep. I think there was something going on with the audio conversion. It reminded me of when I implemented audio sample rate/format conversion manually for the gemini live API and screwed something up (I think it was converting Google's 16bit PCM 24kHz audio to the ULaw audio that twilio requires). Problem was fixed when I switched to a twilio number. Point being, it sounds like you guys are doing some sort of audio format conversion to match the model outputs with whatever phone provider. It's possible that a bug got introduced at that level and that's messing with the inputs to the models.