The best marketing is indistinguishable from non–marketing, like the label on the side of my Contoso® Widget-like Electrical Machine™ — it feels like a list of ingredients and system requirements but every brand name there was sponsored.
> Huh? No, that's been established since Karpathy coined the term; you don't review the code, only use the agent and don't care about how it was done, just about the results.
However, nowadays it is used as a synonym for everything that is somehow generated by an LLM. Regardless of whether it is a spec-driven, carefully reviewed and iterative piece of software or some yolo-style one-prompter with no idea how it was done.
Yes, by people who don't actually understand what they're talking about, doesn't mean we need to fall to lowest common denominator here on HN too.
Most people understanding "hacking" differently than us, but we've made that work, we can talk about hacking here without other HN users believing we're cracking passwords, why not the same for other terms?
They don't want to guarantee an interview to everyone who sends them an improved solution, either.
If three people send them improvements, they'll probably get interviews. If three thousand do, the problem is easier than they thought or amenable to an LLM or one bright person figured out a trick and shared it with all his classmates or colleagues or all of GitHub.
The closest I come to working with part-time, minimum-wage workers is working with student employees. Even then, they earn more and usually work more than five hours a week.
Most of the time, I end up putting in more work than I get out of it. Onboarding, reviewing, and mentoring all take significant time.
Even with the best students we had, paying around 400 euros a month, I would not say that I saved five hours a week.
And even when they reach the point of being truly productive, they are usually already finished with their studies. If we then hire them full-time, they cost significantly more.
Factorio 2.0 seemed to pull it off. I think that as long as users don’t feel misled by a DLC that only adds a few skins, they generally appreciate larger updates to a game.
Exactly this. I thought about getting a T7, but the price is just ridiculous. And it’s not even like you’re paying for quality, there are so many complaints about both minor and major issues.
People being prevented from doing their job because of code formatting? In my nearly 20 years of development, that statement was indeed true, but only before the age of formatters. Back then, endless hours were spent on recurring discussions and nitpicky stylistic reviews. The supposed gains were minimal, maybe saving a few seconds parsing a line faster. And if something is really hard to read, adding a prettier-ignore comment above the lines works wonders. The number of times I’ve actually needed it since? Just a handful.
Code style is a Pareto-optimal problem space: what one person finds readable may look like complete chaos to someone else. There’s no objective truth, and that’s why I believe that in a project involving multiple people, spending time on this is largely a waste of time.
> My experience is it often generates code that is subtlety incorrect. And I'll waste time debugging it.
> […]
> Or it'll help me debug my code and point out things I've missed.
I made both of these statements myself and later wondered why I had never connected them.
In the beginning, I used AI a lot to help me debug my own code, mostly through ChatGPT.
Later, I started using an AI agent that generated code, but it often didn’t work perfectly. I spent a lot of time trying to steer the AI to improve the output. Sometimes it worked, but other times it was just frustrating and felt like a waste of time.
At some point, I combined these two approaches: I cleared the context, told the AI that there was some code that wasn’t working as expected, and asked it to perform a root cause analysis, starting by trying to reproduce the issue. I was very surprised by how much better the agent became at finding and eventually fixing problems when I framed the task from this different perspective.
Now, I have commands in Claude Code for this and other due diligence tasks, and it’s been a long time since I last felt like I was wasting my time.
TBF, they all believed that scaling reinforcement learning would achieve the next level. They had planned to "war-dial" reasoning "solutions" to generate synthetic datasets which achieved "success" on complex reasoning tasks. This only really produced incremental improvements at the cost of test-time compute.
Now Grok is publicly boasting PhD level reasoning while Surge AI and Scale AI are focusing on high quality datasets curated by actual PhD humans.
In my opinion the major advancements of 2025 have been more efficient models. They have made smaller models much, much better (including MoE models) but have failed to meaningfully push the SoTA on huge models; at least when looking at the USA companies.
You can try to build a monster the size of GPT-4.5, but even if you could actually make the training stable and efficient at this scale, you still would suffer trying to serve it to the users.
Next generation of AI hardware should put them in reach, and I expect that model scale would grow in lockstep with new hardware becoming available.
> The agent follows references like a human analyst would. No chunks. No embeddings. No reranking. Just intelligent navigation.
I think this sums it up well. Working with LLMs is already confusing and unpredictable. Adding a convoluted RAG pipeline (unless it is truly necessary because of context size limitations) only makes things worse compared to simply emulating what we would normally do.
> "A typo or two also helps to show it’s not AI (one of the biggest issues right now)."
reply