Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried running it in realtime with live audio input (kind of).

If you want to give it a shot, you can find the python script in this repo: https://github.com/tobiashuttinger/openai-whisper-realtime

A bit more context on how it works: The systems default audio input is captured with python, split into small chunks and is then fed to OpenAI's original transcription function. It tries (currently rather poorly) to detect word breaks and doesn't split the audio buffer in those cases. With how the model is designed, it doesn't make the most sense to do this, but i found it would be worth trying. It works acceptably well.



Haven’t tried it yet but love the concept!

Have you thought of using VAD (voice activity detection) for breaks? Back in my day (a long time ago) the webrtc VAD stuff was considered decent:

https://github.com/wiseman/py-webrtcvad

Model isn’t optimized for this use but I like where you’re headed!


Interesting. I'll take a look at this, thanks!



[flagged]


impressive




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: