What's the best solution right now for TTS that supports speaker diarisation?

makaimc · 2025-09-22T19:12:56 1758568376

AssemblyAI (YC S17) is currently the one that stands out in the WER and accuracy benchmarks (https://www.assemblyai.com/benchmarks). Though its models are accessed through a web API rather than locally hosted, and speaker diarization is enabled through a parameter in the API call (https://www.assemblyai.com/docs/speech-to-text/pre-recorded-...).

xnx · 2025-09-22T19:36:48 1758569808

I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win