hey, not with the team but a fairly experienced user here.
The first message I believe gets directly sent to the voice model, so that's why you're hearing the first message. But in my experience, when the LLM can't be reached (so if an LLM provider is experiencing an outage, an API key doesn't work, or if you're using a custom LLM server and it isn't connecting) then you often get the behaviour that you're describing.
So the problem is likely that the LLM isn't being reached or has a really high delay for one reason or another. I haven't had this happen but I suppose it's also possible to see a similar issiue if the transcriber was down or if your mic on the web interface was blocked or something. If you look at the call logs and there was an error the logs will generally say if it's having a problem connecting.
Hope this helps you debug