I think it will be a winning strategy. Lag is a real killer for LLMs.
I think they'll have another LLM on a server (maybe a deal for openai/gemini) that the one on the device can use like ChatGPT uses plugins.
But on device Apple have a gigantic advantage. Rabbit and Humane are good ideas humbled by shitty hardware that runs out of battery, gets too hot, has to connect to the internet to do literally anything.
Apple is in a brilliant position to solve all those things.
I run a few models (eg Llama3:8b) on my 2023 MacBook Air, and there is still a fair bit of lag and delay, compared to a hosted (and much larger) model like Gemini. A large source of the lag is the initial loading of the model into RAM. Which an iPhone will surely suffer from.
Humane had lag and they used voice chat which is a bad UX paradigm. VUI is bad because it adds lag to the information within the medium. Listening to preambles and lists are always slower than a human eyes ability to scan a page of text. Their lag is not due to LLMs, which can be much faster than whatever they did.
We should remind ourselves that an iPhone can likely suffer similar battery and heat issues - especially if it’s running models locally.
Humane's lag feels down to just bad software design too, it almost feels like a two stage thing is happening like it's sending your voice or transcription up to the cloud, figuring out where it needs to go to get it done, telling the device to tell you its about to do that then finally doing it. E.g
User: "What is this thing?"
Pin: "I'll have a look what that is" (It feels this response has to come from a server)
Pin: "It's a <answer>" (The actual answer)
We're still a bit away from iPhone running anything viable locally, even small models today you can almost feel the chip creaking under the load they're incurring on it and the whole phone begins to choke.
I'm curious to hear more about this. My experience has been that inference speeds are the #1 cause of delay by orders of magnitude, and I'd assume those won't go down substantially on edge devices because the cloud will be getting faster at approximately the same rate.
Have people outside the US benchmarked OpenAI's response times and found network lag to be a substantial contributor to slowness?
Or at least, a good enough internet connection to send plaintext.
> * when you live in the USA
Even from Australia to USA is just ~300ms of latency for first token and then the whole thing can finish in ~1s. And making that faster doesn't require on-device deployment, it just requires a server in Australia - which is obviously going to be coming if it hasn't already for many providers.
There really isn't enough emphasis on the downsides of server side platforms.
So many of these are only deployed in US and so if you're say in country Australia not only do you have all your traffic going to the US but it will be via slow and intermittent cellular connections.
It makes using services like LLMs unusably slow.
I miss the 90s and having applications and data reside locally.
I think they'll have another LLM on a server (maybe a deal for openai/gemini) that the one on the device can use like ChatGPT uses plugins.
But on device Apple have a gigantic advantage. Rabbit and Humane are good ideas humbled by shitty hardware that runs out of battery, gets too hot, has to connect to the internet to do literally anything.
Apple is in a brilliant position to solve all those things.
I hope they announce something good at WWDC