Leaving aside the economic shitshow and other things.
I think you're right but for the wrong reasons wrt sustainable profit.
Specifically, overcounting how much it will cost in 5 years to run AI because you're extrapolating current high prices, and at the same time undercounting how the demand will drive efficiency gains.
> But on a tangent, why do you believe in mixture of experts?
The fact that all big SoTA models use MoE is certainly a strong reason. They are more difficult to train, but the efficiency gains seem to be worth it.
> Every thing I know about them makes me believe they're a dead-end architecturally.
Something better will come around eventually, but I do not think that we need much change in architecture to achieve consumer-grade AI. Someone just has to come up with the right loss function for training, then one of the major research labs has to train a large model with it and we are set.
I just checked Google Scholar for a paper with a title like "Temporally Persistent Mixture of Experts" and could not find it yet, but the idea seems straightforward, so it will probably show up soon.
> But on a tangent, why do you believe in mixture of experts
In a hardware inference approach you can do tens of thousands tokens per second and run your agents in a breadth first style. It is all very simply conceptually, and not more than a few years away.
5 years is a bit optimistic. I have no desire to use anything dumber than Claude - but I doubt I'll need something much smarter either - or with so much niche knowledge baked in. The harness will take care of much. Faster would be nicer though.
That still requires a pretty large chip, and those will be selling at an insane premium for at least a few more years before a real consumer product can try their hand at it.
Yeah, post-Moore's Law anyway. But there could also be real breakthroughs in model architecture. Maybe something replaces transformer with better than quadratic scaling, or MoE lets smaller models and agent farms compete, or, who knows....
Coding, via something like Claude or Codex, will likely always be something best done by hosted cloud models simply because the bar there can always be higher. But it's already entirely possible to run local models for chat and research and basic document creation that can compete perfectly fine with the cloud models from 6 months to a year ago. The limitation at this point is just the cost of RAM.
This week's released of the new smaller Qwen 3.5 models was interesting. I ran a 4-bit quant of the 122b model on my NVIDIA Spark, and it's... pretty damn smart. The smaller models can be run at 8-bits on machines at very reasonable speeds. And they're not stupid. They're smarter than "ChatGPT" was a year or so ago.
AMD Strix Halo machines with 128GB of RAM can already be bought off the shelf for not-insane prices that can run these just fine. Same with M-series Macs.
Once the supply shocks make their way through the system I could see a scenario where it's possible that every consumer Mac or Windows install just comes with a 30B param or even higher model onboard that is smart enough for basic conversation and assistance, and is equipped with good tool use skills.
I just don't see a moat for OpenAI or Anthropic beyond specialized applications (like software development, CAD, etc). For long-tail consumer things? I don't see it.
Even for coding. I mean, there's what, maybe a few thousand common useful technologies, algorithms, and design patterns? A million uncommon ones? I think all that could fit in a local model at some point.
Especially if, for example, Amazon ever develops an AWS-specific model that only needs to know AWS tech and maybe even picks a single language to support, or maybe a different model for each language, etc. Maybe that could end up being tiny and super fast.
I mean, most of what we do is simple CRUD wrappers. Sometimes I think humans in the loop cause more problems than we solve, overindexing on clever abstractions that end up mismatching the next feature, painting ourselves into fragile designs they can't fix due to backward compatibility, using dozens of unnecessary AWS features just for the buzz, etc. Sometimes a single monolith with a few long functions with a million branches is really all you need.
Or, if there's ever a model architecture that allows some kind of plugin functionality (like LoRA but more composable; like Skills but better), that'd immediately take over. You get a generic coding skeleton LLM and add the plugins for whatever tech you have in your stack. I'm still holding out for that as the end game.
I was just sharing my experience I'm not sure what you mean. Just n=1 data point.
From first principles I 100% agree and yes I was using a CLI tool I made with typer that has super clear --help + had documentation that was supposed to guide multi step workflows. I just got much better performance when I tried MCP. I asked Claude Code to explain the diff:
> why does our MCP onbaroding get better performance than the using objapi in order to make these pipelines? Like I can see the
performance is better but it doesn't intuitively make sense to me why an mcp does better than an API for the "create a pipeline" workflow
It's not MCP-the-protocol vs API-the-protocol. They hit the same backend. The difference is who the interface was designed for.
The CLI is a human interface that Claude happens to use. Every objapi pb call means:
- Spawning a new Python process (imports, config load, HTTP setup)
- Constructing a shell command string (escaping SQL in shell args is brutal)
- Parsing Rich-formatted table output back into structured data
- Running 5-10 separate commands to piece together the current state (conn list, sync list, schema classes, etc.)
The MCP server is an LLM interface by design. The wins are specific:
1. onboard://workspace-state resource — one call gives Claude the full picture: connections, syncs, object classes, relations, what exists, what's missing. With the CLI, Claude
runs a half-dozen commands and mentally joins the output.
2. Bundled operations — explore_connection returns tables AND their columns, PKs, FKs in one response. The CLI equivalent is conn tables → pick table → conn preview for each. Fewer
round-trips = fewer places for the LLM to lose the thread.
3. Structured in, structured out — MCP tools take JSON params, return JSON. No shell escaping, no parsing human-formatted tables. When Claude needs to pass a SQL string with quotes
and newlines through objapi pb node add sql --sql "...", things break in creative ways.
4. Tool descriptions as documentation — the MCP tool descriptions are written to teach an LLM the workflow. The CLI --help is written for humans who already know the concepts.
5. Persistent connection — the MCP server keeps one ObjectsClient alive across all calls. The CLI boots a new Python process per command.
So the answer is: same API underneath, but the MCP server eliminates the shell-string-parsing impedance mismatch and gives Claude the right abstractions (fewer, chunkier operations
with full context) instead of making it pretend to be a human at a terminal.
For context I was working on a visual data pipeline builder and was giving it the same API that is used in the frontend - it was doing very poorly with the API.
Exactly. And I kind of believe that anyone citing that comment in 2026 has either been asleep, or does it more to take part on the HN cool in-group than for the substance of it.
Why not rsync rahrah remember guys? You know the one right guys rahrah
Every time i see some new orchestrator framework worth more than a few hundred loc i cringe so hard. Reddit is flooded with them on the daily and HN has them on the front page occasionally.
My current setup is this;
- `tmux-bash` / `tmux-coding-agent`
- `tmux-send` / `tmux-capture`
- `semaphore_wait`
The other tools all create lockfiles and semaphore_wait is a small inotify wrapper.
They're all you need for 3 levels of orchestration. My recent discovery was that its best to have 1 dedicated supervisor that just semaphore_wait's on the 'main' agent spawning subagents. Basically a smart Ralph-wiggum.
The tmux + lockfile approach is underrated. We went through a whole phase of building proper orchestration infra and ended up ripping most of it out. The overhead of coordinating agents through a framework is often worse than just letting them talk through the filesystem. The dirty secret of multi-agent systems is that the coordination layer is usually where the bugs live, not the agents themselves.
The whole point is that people don't throw away their original device.
Yours situation seems rather niche, and it sounds like you might be going out of 'business' while at the same time allowing 1000x times the number of people to want to do dummy-self-repairs (i.e. replace their batteries) even if it's with a bit more theater about who is licensed.
The total number of people means much more demand - even for what you cook-up manually as not-a-business.
Like I said I don't have a business. I don't even do repairs for other people (unless for friends for free). Being accused of being a business was really annoying, I'm just a tinkerer so I have a lot of stuff I play with yes. I'm a member of a makerspace so used electronics are really nice.
I'm just worried they will start tracking individual components of devices too like they do with car batteries now and cause a lot of hassle if you do something that doesn't fit the standard flow. When it comes to EVs I don't give a shit because I hate cars, but once I can't repurpose other electronics anymore as I see fit, it will be a problem. I view this as a sneaky way of introducing a subscription model to electronics, like you don't really own the stuff you buy anymore. Like that evil WEF slogan: "You will own nothing and you will be happy".
I've seen this played out 3 times with none devs i know personally. Somebody had an idea, starts vibing and feeling like they're making insane progress and cool stuff, but what can most generously be summarized as: a big Meh.
> Most of all, there is now an illusion of a lower barrier to entry.
Arguably, there has never been a higher barrier to entry.
The benefits accrue to the skilled. We all got X% more powerful, and those who were already skilled to begin with get a proportionally better outcome.
This coding agent is minimal, and it completely changed how I used models and Claude's cli now feels like extremely slow bloat.
I'd not be surprised if you're right in that this is companies / management will prefer to "pay for a complete package" approach for a long while, but power-users should not care for the model providers.
I have like 100 lines of code to get me a tmux controls & semaphore_wait extension in the pi harness. That gave me a better orchestration scheme a month ago when I adopted it, than Claude has right now.
As far as I can tell, the more you try to train your model on your harness, the worse they get. Bitter lesson #2932.
OpenAI, Anthropic, Google, Microsoft certainly desire path dependence but the very nature of LLMs and intelligence itself might make that hard unless they can develop models which truly are differentiated (and better) from the rest. The Chinese open source models catching up make me suspect that won't happen. The models will just be a commodity. There is a countdown clock for when we can get Opus 4.6+ level models and its measured in months.
The reason these LLM tools being good is they can "just do stuff." Anthropic bans third party subscription auth? I'll just have my other tool use Claude Code in tmux. If third party agents can be banned from doing stuff (some advanced always on spyware or whatever), then a large chunk of the promise of AI is dead.
Amp just announced today they are dumping IDE integration. Models seem to run better on bare-bones software like Pi, and you can add or remove stuff on the fly because the whole things open source. The software writes itself. Is Microsoft just trying to cram a whole new paradigm in to an old package? Kind of like a computer printer. It will be a big business, but it isn't the future.
At scale, the end provider ultimately has to serve the inference -- they need the hardware, data centers & the electricity to power those data centers. Someone like Microsoft can also provide a SLA and price such appropriately. I'll avoid a $200/month customer acquisition cost rant, but one user, running a bunch of sub agents, can spend a ton of money. If you don't own a business or funding source, the way state of the art LLMs are being used today is totally uneconomical (easy $200+ an hour at API prices.)
36+ months out, if they overbuild the data centers and the revenue doesn't come in like OpenAI & Anthropic are forecasting, there will be a glut of hardware. If that's the case I'd expect local model usage will scale up too and it will get more difficult for enterprise providers.
(Nothing is certain but some things have become a bit more obvious than they were 6 months ago.)
Thinking about this a little more -> "nature of LLMs and intelligence"
Bloated apps are a material disadvantage. If I'm in a competitive industry that slow down alone can mean failure. The only thing Claude Code has going for it now is the loss making $200 month subsidy. Is there any conceivable GUI overlay that Anthropic or OpenAI can add to make their software better than the current terminal apps? Sure, for certain edge cases, but then why isn't the user building those themselves? 24 months ago we could have said that's too hard, but that isn't the case in 2026.
Microsoft added all of this stuff in to Windows, and it's a 5 alarm fire. Stuff that used to be usable is a mess and really slow. Running linux with Claude Code, Codex, or Pi is clearly superior to having a Windows device with neither (if it wasn't possible to run these in Windows; just a hypothetical.)
From the business/enterprise perspective - there is no single most important thing, but having an environment that is reliable and predictable is high up there. Monday morning, an the Anthropic API endpoint is down, uh oh! In the longer term, businesses will really want to control both the model and the software that interfaces with it.
If the end game is just the same as talking to the Star Trek computer, and competitors are narrowing gaps rather than widening them (e.g. Anthropic and OpenAI releases models minutes from each other now, Chinese frontier models getting closer in capability not further), then it is really hard to see how either company achieves a vertical lock down.
We could actually move down the stack, and then the real problem for OpenAI and Anthropic is nVidia. 2030, the data center expansion is bust, nVidia starts selling all of these cards to consumers directly and has a huge financial incentive to make sure the performant local models exist. Everyone in the semiconductor supply chain below nvidia only cares about keeping sales going, so it stops with them.
Maybe nvidia is the real winner?
Also is it just me or does it now feel like hn comments are just talking to a future LLM?
I think you're right but for the wrong reasons wrt sustainable profit.
Specifically, overcounting how much it will cost in 5 years to run AI because you're extrapolating current high prices, and at the same time undercounting how the demand will drive efficiency gains.
reply