I am one of the top contributors to the tiny Mozilla Common Voice data-set for my language. The data-set is very small compared to other popular languages and none of the other mentioned data-sets contribute to that language to train the model of Whisper.
And even with so little data to train on it still works surprisingly well.
I am one of the top contributors to the tiny Mozilla Common Voice data-set for my language. The data-set is very small compared to other popular languages and none of the other mentioned data-sets contribute to that language to train the model of Whisper.
And even with so little data to train on it still works surprisingly well.