I'm very much looking forward to this shift. It is SO MUCH more pro-consumer than the existing SaaS model. Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation. It feels like every time I ask for programmatic access to SaaS tools in order to simplify a workflow, I get stuck in endless meetings with product managers trying to "understand my use case", even for products explicitly marketed to programmers.
Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.
> Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation
OK, but: that's an economic situation.
> so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.
Right, so there's less profit in it.
To me it seems this will make the market more adversarial, not less. Increasing amounts of effort will be expended to prevent LLMs interacting with your software or web pages. Or in some cases exploit the user's agentic LLM to make a bad decision on their behalf.
the "exploit the user's agentic LLM" angle is underappreciated imo. we already see prompt injection attacks in the wild -- hidden text on web pages that tells the agent to do things the user didn't ask for. now scale that to every e-commerce site, every SaaS onboarding flow, every comparison page.
it's basically SEO all over again but worse, because the attack surface is the user's own decision-making proxy. at least with google you could see the search results and decide yourself. when your agent just picks a vendor for you based on what it "found," the incentive to manipulate that process is enormous.
we're going to need something like a trust layer between agents and the services they interact with. otherwise it's just an arms race between agent-facing dark patterns and whatever defenses the model providers build in.
Maybe. Or maybe services will switch to charging per API call or whatever instead of monthly or per-seat. Who can predict the future?
I mean, services _could_ make it harder to use LLMs to interact with them, but if agents are popular enough they might see customers start to revolt over it.
> What is in the nature of bike-riding that cannot be reduced to text?
You're asking someone to answer this question in a text forum. This is not quite the gotcha you think it is.
The distinction between "knowing" and "putting into language" is a rich source of epistemological debate going back to Plato and is still widely regarded to represent a particularly difficult philosophical conundrum. I don't see how you can make this claim with so much certainty.
"A human can't learn to ride a bike from a book, but an LLM could" is a take so unhinged you could only find it on HN.
Riding a bike is, broadly, learning to co-ordinate your muscles in response to visual data from your surroundings and signals from your vestibular and tactile systems that give you data about your movement, orientation, speed, and control. As LLMs only output tokens that represent text, by definition they can NEVER learn to ride a bike.
Even ignoring that glaring definitional issue, an LLM also can't learn to ride a bike from books written by humans to humans, because an LLM could only operate through a machine using e.g. pistons and gears to manipulate the pedals. That system would be controlled by physics and mechanisms different from humans, and not have the same sensory information, so almost no human-written information about (human) bike-riding would be useful or relevant for this machine to learn how to bike. It'd just have to do reinforcement learning with some appropriate rewards and punishments for balance, speed, and falling.
And if we could embody AI in a sensory system so similar to the human sensory system that it becomes plausible text on bike-riding might actually be useful to the AI, it might also be that, for exactly the same reasons, the AI learns just as well to ride just by hopping on the thing, and that the textual content is as useless to it as it is for us.
Thinking this is an obvious gotcha (or the later comment that anyone thinking otherwise is going to have egg on their face) is just embarrassing. Much more of a wordcel problem than I would have expected on HN.
I don’t get it from your message why am llm can’t do it
Related: Have you seen nvidea with their simulated 3d env. That might not be called llm but it’s not very far away from what our llm actually do right now. It’s just a naming difference
This argument was specifically about LLMs, not about other techniques (RL, multi-armed bandit, etc) that might be better leveraged to accomplish this type of goal.
An LLM which makes a tool call to a function called `ride_bike`, where that function is a different sort of model with a different set of feedback mechanisms than those available to the LLM, is NOT the same thing at all. The LLM hasn't "learned" to ride the bike. The best you can say is that the LLM has learned that the bike can be ridden, and that it has a way of asking some other entity to ride on its behalf.
Now, could you develop such a model and make it available to an LLM? Sure, probably. But that's not an LLM. Moreover, it involves you, a human, making novel inroads on a different sort of AI/robotics problem. It simply is not possible to accomplish with an LLM.
Theoretical, infinite-width, single-layer MLPs are universal approximators: modern models that actually exist are not.
And modern transformers definitely underperform models with built in priors (e.g. CNN) when they don't have massive amounts of data. Nevermind that LLMs simply can't at all handle all sorts of data types https://news.ycombinator.com/item?id=46948612.
Just another example of an HN commentator making statements about something they don't have any actual basic understanding of. Try reading some actual papers instead of the usual blog posts and marketing spam from frontier AI companies, you might learn something important.
You have resorted to, "You don't want to end up being wrong, do you?" To paraphrase Asimov, this kind of fallacious appeal is the last resort of the in-over-the-heads
Lot of people in this thread being caught with their pants down. Dunno what it is about LLM and AI discourse that causes people to lie or so freely offer opinions on things they clearly have no understanding about whatsoever. AI discourse truly is a great Dunning-Kruger filter.
I'm not sure I understand your complaint. Is it that he misuses the term Pascal's Wager? Or more generally that he doesn't extend enough credibility to the ideas in AI 2027?
More the former. Re the latter, it's not so much that I'm annoyed he doesn't agree with the AI2027 people, it's that (he spends a few paragraphs talking about them while) he doesn't appear to have bothered trying to even understand them.
Insurance companies absolutely benefit from the higher and opaque prices, because they negotiate rebates with providers. This allows them to maximize patient copays and ensures they hit their deductible, i.e. paying as much as possible under their respective insurance plans. Contrast this with a no-rebate world with cheaper/more transparent pricing. Fewer patients would hit their out of pocket maximum.
They can use the rebates they get from the providers to subsidize the insured, allowing them to offer lower premiums and gain market share. This is what people mean when they say "In America, the sick people pay to subsidize the health care of the healthy people".
Of course, that above only applies if there is competitive pressure. If there is no competitive pressure (e.g. in states with only one or two insurers), they can keep premiums high and book as profit the difference between what the patient paid out and what the patient would have paid out in a lower-cost no-rebate world.
> Contrast this with a no-rebate world with cheaper/more transparent pricing. Fewer patients would hit their out of pocket maximum.
And premiums would go up. Every insurer has to get their premium approved by every state’s insurance regulator, and every state’s insurance regulator is not going to allow them to have more than a few percent of profit.
> They can use the rebates they get from the providers to subsidize the insured, allowing them to offer lower premiums and gain market share. This is what people mean when they say "In America, the sick people pay to subsidize the health care of the healthy people".
I’ve never heard of this, and it’s legally not allowed. The ACA mandates insurers price plans so that old people only pay at most 3x what young people pay. And the ACA does not allow insurers to charge more to people likelier to need healthcare. Mathematically, that means younger and healthier people pay higher premiums so that older and sicker people can have lower premiums.
NY state goes even further and says all ages pay the same premium, so young subsidizes old even more. MA has a 2x cap, I believe. And then of course, FICA taxes mean the young and working are paying for the healthcare for the old and non working, the vast majority of all healthcare spend in the US (Medicare).
Yes. As I wrote above, insurers compete on premiums, and they do do so by using rebates to subsidize those premiums by spreading patients' deductibles across the insured population. As far as profits go, I can't speak to regulatory issues since they will vary by state, but in any case the same critique would apply if insurers are pocketing a fixed percentage of a larger amount.
Re your second point, it completely twists my point and is largely irrelevant. Yes, older people paying the same premiums as younger people is a counter-argument in that older people are more likely to need healthcare, but the central point is that people who have to USE their insurance (i.e. sick people) subsidize the premiums of people who don't (healthy people), and this critique applies regardless of age. Now, one could argue that the structural factors that control costs across age cohorts counterbalances this phenomenon. And I'd agree with you! But that doesn't negate the original point that insurance companies benefit from, and advocate for, high sticker prices.
> but the central point is that people who have to USE their insurance (i.e. sick people) subsidize the premiums of people who don't (healthy people), and this critique applies regardless of age.
You’re losing me here. This claim is categorically false. You cannot consider only the deductible when calculating who subsidizes who.
The only way to calculate it is premiums + deductible + out of pocket maximum = total healthcare costs. And the subsidy via premium is so large that it negates effects of a deductible and out of pocket maximum.
Note that all plans have to be actuarially equivalent, regardless of what deductible you choose. The actuaries have to account for rebates and other pricing strategies when ensuring actuarial equivalence, so that the ratio of what the plan pays versus what you pay meets the required ratio for that metal level.
Since your health is not a factor in pricing your insurance, it has to be that people less likely to need healthcare pay for the people likely to need healthcare.
It is the same as if the government forbade auto insurers from using moving violations history, or life insurers from using health measures, or home insurers from using flood maps.
The claim about who subsidizes who was always hyperbole, I'll grant you that. I included the statement to make the point that this is the phenomenon people are referring to when they make that statement.
I happen to think there is validity to the statement if you control for other actuarial factors. But if you don't think that makes sense as a lens through which to look at the problem, I won't quibble, even though I disagree. We're also only talking about drug prices here, which is a small portion of overall healthcare spending.
In any case, the central point, that insurers benefit from higher prices, still stands.
> In any case, the central point, that insurers benefit from higher prices, still stands.
All sellers benefit from higher prices. No one limits the price they ask for out of the goodness of their hearts. Lower prices are because a competitor offers a lower price, and because buyers can’t pay a higher price.
I have found MCPs to be very useful (albeit with some severe and problematic limitations in the protocol's design). You can bundle them and configure them with a desktop LLM client and distribute them to an organization via something like Jamf. In the context I work in (biotech) I've found it a pretty high-ROI way to give lots of different types of researchers access to a variety of tools and data very cheaply.
I believe you, but can you elaborate? What exactly does MCP give you in this context? How do you use it? I always get high level answers and I'm yet to be convinced, but i would love this to be one of those experiences where i walk away being wrong and learning something new.
Sure, absolutely. Before I do, let me just say, this tooling took a lot of work and problem solving to establish in the enterprise, and it's still far from perfect. MCPs are extremely useful IMO, but there are a lot of bad MCP servers out there and even good ones are NOT easy to integrate into a corporate context. So I'm certainly not surprised when I hear about frustrations. I'm far from an LLM hype man myself.
Anyway: a lot of earlier stages of drug discovery involve pulling in lots of public datasets, scouring scientific literature for information related to a molecule, a protein, a disease, etc. You join that with your own data and laboratory capabilities and commercial strategy in order to spot opportunities for new drugs that you could maybe, one day, take into the clinic. This is traditionally an extremely time consuming and bias prone activity, and whole startups have gone up around trying to make it easier.
A lot of the public datasets have MCPs someone has put together around someone's REST API. (For example, a while ago Anthropic released "Claude for Life Sciences" which was just a collection of MCPs they had developed over some popular public resources like PubMed).
For those datasets that don't have open source MCPs, and for our proprietary datasets, we stand up our own MCPs which function as gateways for e.g. running SQL queries or Spark jobs against those datasets. We also include MCPs for writing and running Python scripts using popular bioinformatics libraries, etc. We bundle them with `mcpb` so they can be made into a fully configured one-click installer you can load into desktop LLM clients like Claude Desktop or LibreChat. Then our IT team can provision these fully configured tools for everyone in our organization using MDM tools like Jamf.
We manage the underlying data with classical data engineering patterns, ETL jobs, data definition catalogs, etc, and give MCP-enabled tools to our researchers as front end concierge type tools. And once they find something they like, we also have MCPs which can help transform those queries into new views, ETL scripts, etc and serve them using our non-LLM infra, or save tables, protein renderings, graphs, etc and upload them into docs or spreadsheets to be shared with their peers. Part of the reason we have set it up this way is to work through the limitations of MCPs (e.g. all responses have to go through the context window, so you can't pass large files around or trust that it's not mangling the responses). But also we do this so as to end up with repeatable/predictable data assets instead of LLM-only workflows. After the exploration is done, the idea is you use the artifact, not the LLM, to intact with it (though of course you can interact with the artifact in an LLM-assisted workflow as you iterate once again in developing a yet another derivative artifact).
Some of why this works for us is perhaps unique to the research context where the process of deciding what to do and evaluating what has already been done is a big part of daily work. But I also think there are opportunities in other areas, e.g. SRE workflows pulling logs from Kubernetes pods and comparing to Grafana metrics, saving the result as a new dashboard, and so on.
What these workflows all have in common, IMO, is that there are humans using the LLM as an aid to dive understanding, and then translating that understanding into more traditional, reliable tools. For this reason, I tend to think that the concept of autonomous "agents" is stupid, outside of a few very narrow contexts. That is to say, once you know what you want, you are generally better off with a reliable, predictable, LLM-free application, but LLMs are very useful in the prices of figuring out what you want. And MCPs are helpful there.
This is fascinating. I really appreciate the length reply.
How do you handle versioning/updates when datasets change? Do the MCPs break or do you have some abstraction layer?
What's your hit rate on researchers actually converting LLM explorations into permanent artifacts vs just using it as a one-off?
Makes sense for research workflows. Do you think this pattern (LLM exploration > traditional tools) generalizes outside domains with high uncertainty? Or is it specifically valuable where 'deciding what to do' is the hard part?
Someone else mentioned using Chrome dev tools + Cursor, I'm going to try that one out as a way to convince myself here. I want to make this work but I just feel like I'm missing something. The problem is clearly me, so I guess i need to put in some time here.
I'll give you a short reply, as another person who finds MCP very useful. I think a big gap is that MCP's are often marketed as "taking actions" for you, because that's flashy and looks cool in the eyes of laymen. While most of their actual value is the opposite, in using them to gather information to take better non-MCP actions. Connecting them to logs, read-only to (e.g. mock) databases, knowledge bases, and so on. All for querying, not for create/update/delete.
> How do you handle versioning/updates when datasets change?
For data MCPs, we use remote MCPs that are served over an stdio bridge. So our configuration is just mcp-proxy[0] pointed at a fixed URL we control. The server has an /mcp endpoint that provides tools and that endpoint is hit whenever the desktop LLM starts up. So adding/removing/altering tools is simply a matter of changing that service and redeploying that API. (Note: There are sometimes complications, e.g. if I change an endpoint that used to return data directly, but now it writes a file to cloud storage and returns a URL (because the result is to large, i.e. to work around the aforementioned broken factor of MCP) we have to sync with our IT team to deploy a configuration change to everyone's machine.)
I have seen nicer implementations that use a full MCP gateway that does another proxy step to the upstream MCP servers, which I haven't used myself (though I want to). The added benefit is that you can log/track which MCPs your users are using most often and how they are doing, and you can abstract away a lot of the details of auth, monitor for security issues, etc. One of the projects I've looked at in that space is Mint MCP, but I haven't used it myself.
> What's your hit rate on researchers actually converting LLM explorations into permanent artifacts vs just using it as a one-off?
Low. Which in our case is ideal, since most research ideas can be quickly discarded and save us a ton of time and money that would otherwise be spent running doomed lab experiments, etc. As you get later in the drug discovery pipeline you have a larger team built around the program, and then the artifacts are more helpful. There still isn't much of a norm in the biotech industry of having an engineering team support an advanced drug program (a mistake, IMO) so these artifacts go a long way given these teams don't have dedicated resources.
> Do you think this pattern (LLM exploration > traditional tools) generalizes outside domains with high uncertainty?
I don't know for sure, as I don't live in that world. My instinct is: I wouldn't necessarily roll something like this out to external customers if you have a well-defined product. (IMO there just isn't that much of a market for uncertain outputs of such products, which is why all of the SaaS companies that have launched their integrated AI tools haven't seen much success with them.) But even within a domain like that, it can be useful to e.g. your customer support team, your engineers, etc. For example, one of the ideas on my "cool projects" list is an SRE toolkit that can query across K8s, Loki/Prometheus, your cloud provider, your git provider and help quickly diagnose production issues. I imagine the result of such an exploration would almost always be a new dashboard/alert/etc.
If you had developed novel techniques of sfumato and chiaroscuro, spun new theories of perspective and human anatomy, invented new pigments, and then explained all of that to a journeyman painter, with enough coaching, detail, and oversight to ensure the final product was what you envisioned, I would argue that 100% makes you Da Vinci.
Da Vinci himself likely had dozens of nameless assistants laboring in his studio on new experiments with light and color, new chemistry, etc. Da Vinci was Da Vinci because of his vision and genius, not because of his dexterity with his hands.
I'm no fan of tariffs, but oh please. Last time their prices went up because of COVID and because of supply chain disruptions. Now they are going up because of tariffs. All of their earnings calls are filled with analyses of their "pricing power" i.e. the degree to which they can pass on these costs to customers. But when the costs decline, they are happy to keep the prices inflated and pocket the profits.
low net profits margins are simply how retail works, it's not news that retail requires scale. and in fact both target and best buy did see record profits shortly after the pandemic started.
Consumer spending was shattering records during the pandemic.
If consumer spending was dropping as prices were going up, then sure, greed. But prices were rising and consumers were relentless. Which is totally logical. Even more logical when the "spending class" was getting massive raises/offers and rock bottom credit.
This does not at all resonate with my experience with human researchers, even highly paid ones. You still have to do a lot of work to verify their claims.
The article doesn’t rule this out. Most of these emails are templated out in some 3rd party email service. It is extremely plausible that the author is unaware of the text email content.
If someone had a rejection email then we could check this. But
Reading the article is most improper on this here orange website. You’re supposed to read the headline, and imagine what the content of the article might be.
Yes, I did. My point is that the author might be jumping to conclusions. It is far more likely that they introduced a bug in their content than it is that a bunch of email providers who haven't changed in a decade suddenly released the same buggy AI product without fanfare.
Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.