It's possible to run a voice AI similar to this entirely locally on a normal gaming PC with a good GPU using open source AI models. I have a standalone demo I've been working on that you can try if you have a 12GB+ Nvidia GPU: https://apps.microsoft.com/detail/9NC624PBFGB7
It's still very much a demo and not as good as GPT-4, but it responds much faster. It's fun to play with and it shows the promise. Open models have been improving very quickly. It remains to be seen just how good they can get but personally I believe that better-than-GPT-4 models are going to run on a $1k gaming PC in just a few years. You will be able to have a coherent spoken conversation with your GPU. It's a new type of computing experience.
Currently OpenHermes2-Mistral-7B (via exllama2), OpenAI Whisper (via faster_whisper), and StyleTTS2 (uses HF Transformers). All PyTorch-based.
I will probably update to the OpenHermes vision model when Nous Research releases it, so it'll be able to see with the webcam or even read your screen and chat about what you're working on! I also need to update to Whisper-v3 or Distil-Whisper, and I need to update to a newer StyleTTS2. I also plan to add a Mandarin TTS and Qwen-7B bilingual LLM for a bilingual chatbot. The amount of movement in open AI (not to be confused with OpenAI) is difficult to keep up with.
Of course I need to add better attribution for all this stuff, and a whole lot of other things, like a basic UI. Very much an MVP at the moment.
This is why I think the personal Jarvis for everyday use won’t eventually be in the cloud. It can already be done on local hw as you are demonstrating and cloud has big downsides around privacy and latency and reliability.
Like you said it’s difficult to keep up with and to me it feels very much like open source stuff might win for inference.
And yet of all things word processing and spreadsheets are going to the cloud, or even coding.
Not sure big players won't be pushing heavily (as in, not releasing their best models) for the fat subscriptions/data gathering in the cloud, even if I'd much rather see local (as in cloud at home) computing
I understand what you mean. It makes me wonder if there is room for a solution those who want to own their own hardware and data. Almost like other appliances and equipment that initial cost too much for household ownership but maybe having a “brain” in your house will be a luxury appliance for example. As a Crestron system owner I would love to plugin Jarvis to my smart home somehow.
Maybe it can be a hardware with markup and support and consulting model. That way there could be many competitors in various regions or different countries and we all use the same collection of open source tools. That would be pretty neat. Unlikely I guess but still worth thinking about how it could work.
Great question! Right now both ChatGPT and my demo are doing very simple and basic stuff that definitely needs improvement.
ChatGPT is essentially push-to-talk with a little bit of automation to attempt to press the button automatically at certain times. Mine is continuously listening and can be interrupted while speaking, but isn't yet smart enough to delay responding if you pause in the middle of a sentence, or stop responding at the natural end of a conversation.
It's still very much a demo and not as good as GPT-4, but it responds much faster. It's fun to play with and it shows the promise. Open models have been improving very quickly. It remains to be seen just how good they can get but personally I believe that better-than-GPT-4 models are going to run on a $1k gaming PC in just a few years. You will be able to have a coherent spoken conversation with your GPU. It's a new type of computing experience.