GPTSOVITS, StyleTTS2, and RVCv2 are still the open source SOTA for TTS and voice...

GPTSOVITS, StyleTTS2, and RVCv2 are still the open source SOTA for TTS and voice conversion. These models are unfortunately really far behind Elevenlabs' offerings. We're not much further along than the Tacotron2 (2018) days.

Elevenlabs is the only model company I can think of that is ahead of everyone else in their category. Video and LLMs are hyper competitive, but voice is a one-company game. Elevenlabs hired up everyone in the space and utterly dominates.

I'm hoping this changes. They've been in pole position for over a year and a half now with nobody even coming close.

There's probably a reason why they're so research-oriented. The minute an open source model is released that rivals Elevenlabs in quality, they're in big trouble. There's absolutely zero moat for their current products and there are fifty companies nipping at their heels that want to be in the same spot. Elevenlabs' current margins are juicy.