Do you have figures supporting that? Because so far everything I've seen points to current inference subscriptions being wildly unprofitable. It may be a bit dated, but I haven't seen any new reports on unit economics coming close. And forward projections for the data center side, which is the pure play of inference itself, says something like $40B of depreciation per year (assuming they take a full 10 years to depreciate) and maybe $15-20B of revenue to make up for that.
Now, we're still talking about "some analyst" and the most undifferentiated pure play for the underlying economics of inference itself as a whole, but I think that the latter, at least, should remain relevant because if the underlying inference doesn't work on current subscriptions, then nothing built on top of it without significant additional charge will.
It absolutely does matter. LLMs still have to consumer context and process complexity. The more LoC, the more complexity, the more errors you have and the higher your LLM bills. That's even in the AI maximalist, vibe-code only use case. The reality is that AI will have an easier time working in a well-designed, human-written codebase than one generated by AI, and the problem of AI code output turning into AI coding inputs resulting in the AI choking and on itself and making more errors tends to get worse over time, with human oversight being the key tool to prevent this.
It really depends on how it scales. If it can scale to LLM sizes via this training method (Which is a big if), then it could mean fundamentally overturning the transformer architecture and replacing it with RNNs in the most optimistic case.
But if not, it could mean as little as some LLM-adjacent tools like vec2text get reworked into RNNs. Or some interesting fine-tuning at least.
We're actually in a great place to reverse aging already with off the shelf stuff you can get from the grey market. We have countless interventions that show improvements in aging-related metrics in humans that show verified longevity benefits in animal models. That's about as good as you can hope for without waiting a lifetime to actually confirm the human actuarial benefit, which is inherently a losing strategy.
Because I'm not sure exactly what you're looking for when you say 'compares to' -- whether accuracy, speed, or architecture -- I'll hit all 3, but sorry if it's a bit much.
1. Accuracy: For simple tasks (like sentiment analysis on straightforward examples), it won't be much more accurate than a classical linear classifier, if at all.
1a. Accuracy on more diverse or challenging tasks: Because a linear classifier is just so damned simplistic, it simply cannot handle anything even resembling a reasoning task. Meanwhile, (when specifically trained), this architecture managed to get 8/10 on textual entailment tasks, which are generally considered the sort of entry level gold standard for reasoning ability.
2. Speed: It's slower than a classical classifier...in light of the ~1B params it's pushing. They're both still pretty much blazing fast, but the tiny classical classifier will definitely be faster.
3. Architecture:
Here's where it gets interesting.
The architecture of the core model here differs significant from a classical linear classifier:
Classical Classifier:
Input: BGE embedding (in this hypothetical)
Output: Class labels through softmax
Internal Architecture: No nonlinearity, no hidden layers, direct projection
General Classifier:
Input: BGE Embedding
Output: Class labels through nearest neighbor cosine similarity search of vocabulary
Internal architecture: An input projection sparse layer, a layer for combining the 3 inputs after their upwards projection, and 14 hidden layers with nonlinearity (GELU), layernorms, skip connections -- all of the standard stuff you'd expect in an LLM, but...not in an LLM.
I hope that clears up your questions! If not, I'm happy to tell you more.
I would say that ease of use and deployment is actually a good reason to have a single model.
We don't train 20 LLMs for different purposes - we train one (or, I guess 3-4 in practice, each with their own broad specialization), and then prompt it for different tasks.
This simplifies deployment, integration, upgrading, etc.
This model is basically the same - instead of having a restriction to doing single-task classification. This means that a user can complete new tasks using a new prompt, not a new model.
While I agree with the general reasoning, isn't it harder for the user to prompt the model correctly as opposed to selecting a specialized model that they wish to use?
That's the feeling I have when I try to use LLMs for more general language processing.
Have you run in cases where the model "forgets" the task at hand and switches to another mid text stream?
Regardless of all of the above. It looks to me that your choice of reasoning and problem solving in the latent space is a great one and where we should be collectively focusing our efforts, keep up the good work.
Ah, that's the beauty of it! It's not an LLM. It's a new class of model:
A DSRU / Direct Semantic Reasoning Unit.
It's a vec2vec architecture - it takes in 3 bge-large embeddings of the task, the input data, and the vocabulary. It outputs 1 bge-large embedding of the answer.
That's the DSRU part.
What makes it a classifier is that later, outside of the model, we do a nearest neighbor search for our vocabulary items using our answer vector. So it will output something from the labels no matter what - the nearest neighbor search will always have something closest, even if the model went a little crazy internally.
The prompts here tend to be very straightforward. Things like:
"Is this book review positive or negative?"
"Is this person sharing something happy or venting?"
"Determine the logical relationship between the premise and hypothesis. Answer with: entailment, neutral, or contradiction."
It has limited use cases, but where it's good, it should be very, very good - the insane speed, deterministic output, and forced label output makes it great for a lot of common, cheap tasks.
I haven't seen an LLM stay on task anywhere near that long, like...ever. The only thing that works better left running overnight that has anything to do with ML, in my experience, is training.
Overall, I agree - it would take far more sophisticated and deterministic or 'logical' AI better capable of tracking constraints, knowing what to check and double check, etc... Right now, AI is far too scattered to pull that off (or, for the stuff that isn't scattered, it's largely just incapable), but a lot of smart people are thinking about it.