Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.

That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.





Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)

I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.

I’d just try it and fork handy if it doesn’t work how you want :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: