Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the best solution right now for TTS that supports speaker diarisation?


AssemblyAI (YC S17) is currently the one that stands out in the WER and accuracy benchmarks (https://www.assemblyai.com/benchmarks). Though its models are accessed through a web API rather than locally hosted, and speaker diarization is enabled through a parameter in the API call (https://www.assemblyai.com/docs/speech-to-text/pre-recorded-...).


I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: