Is there a way to output word-level timestamps for the audio? I am trying to animate a talking a hea

I ultimately need to convert the word into visemes that play on an Avatar. For this I just need the word, and time stamp start/end per word
Was this page helpful?