Inconsistent Transcript Times Compared to Audio Times

Hi, I am trying to align the transcript to the speech in audio (using duration, endTime and startTime), but all of the values returned by Get Call VAPI API are very off to the audio file's times.

For example, you can already see in the screenshot that when I say "Hello?", the audio starts on roughly 0.9s, but message #2 secondsFromStart returns 1.637s, off by 0.7 seconds. Even bigger difference, is message #4 for which audio starts on roughly 6.9s but secondsFromStart value is on 7.927s, resulting in 1s delay.

I tried calculating using just time too, offsetting from the very first time from the first system message, but delays are still present.

Note this happens to every single call I make, the call id for this example is 019a1b1c-c2bb-7338-b57c-a21dce505ff2.
image.png
Was this page helpful?