What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out.
I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.
Like what the sibling comment said: money. It's cheaper to produce one type of screen module and deploying that one type across car models that different kinds of switches. Also it was some kind of USP; to public perception of touch screens equal luxury during iPhone boom. Even though the software implementations were left to be desired ie. nothing was buttery smooth
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
reply