Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cool!

I am one of the top contributors to the tiny Mozilla Common Voice data-set for my language. The data-set is very small compared to other popular languages and none of the other mentioned data-sets contribute to that language to train the model of Whisper.

And even with so little data to train on it still works surprisingly well.



Where do they mention what datasets they've used? I've tried looking at the paper but can't find it.


Nevermind: I found it. It's on page 19 and 20 of the paper, under Appendix A ("Evaluation Datasets").


[zalgo redacted]


Hey - can you please not zalgo on HN? It messes up the threads. I've redacted it from your posts now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: