More

maerch · 2026-01-28T14:37:37 1769611057

It’s already happening. This came up in a webinar attended by someone from our sales team:

> "A typo or two also helps to show it’s not AI (one of the biggest issues right now)."

lufenialif2 · 2026-01-28T16:04:13 1769616253

When it comes to forum posts, I think getting to the point quickly makes something worth reading whether or not it’s AI generated.

The best marketing is usually brief.

direwolf20 · 2026-01-28T17:42:47 1769622167

The best marketing is indistinguishable from non–marketing, like the label on the side of my Contoso® Widget-like Electrical Machine™ — it feels like a list of ingredients and system requirements but every brand name there was sponsored.

maerch · 2026-01-21T12:25:35 1768998335

> Huh? No, that's been established since Karpathy coined the term; you don't review the code, only use the agent and don't care about how it was done, just about the results.

However, nowadays it is used as a synonym for everything that is somehow generated by an LLM. Regardless of whether it is a spec-driven, carefully reviewed and iterative piece of software or some yolo-style one-prompter with no idea how it was done.

embedding-shape · 2026-01-21T12:38:03 1768999083

Yes, by people who don't actually understand what they're talking about, doesn't mean we need to fall to lowest common denominator here on HN too.

Most people understanding "hacking" differently than us, but we've made that work, we can talk about hacking here without other HN users believing we're cracking passwords, why not the same for other terms?

maerch · 2026-01-21T07:39:28 1768981168

In that case, removing „perhaps“ would have helped a lot. It is not about maybe being hired, but about maybe being interviewed.

dmurray · 2026-01-21T08:30:20 1768984220

They don't want to guarantee an interview to everyone who sends them an improved solution, either.

If three people send them improvements, they'll probably get interviews. If three thousand do, the problem is easier than they thought or amenable to an LLM or one bright person figured out a trick and shared it with all his classmates or colleagues or all of GitHub.

maerch · 2025-12-12T05:52:02 1765518722

The closest I come to working with part-time, minimum-wage workers is working with student employees. Even then, they earn more and usually work more than five hours a week.

Most of the time, I end up putting in more work than I get out of it. Onboarding, reviewing, and mentoring all take significant time.

Even with the best students we had, paying around 400 euros a month, I would not say that I saved five hours a week.

And even when they reach the point of being truly productive, they are usually already finished with their studies. If we then hire them full-time, they cost significantly more.

maerch · 2025-11-19T13:59:13 1763560753

Factorio 2.0 seemed to pull it off. I think that as long as users don’t feel misled by a DLC that only adds a few skins, they generally appreciate larger updates to a game.

maerch · 2025-10-13T10:25:40 1760351140

Exactly this. I thought about getting a T7, but the price is just ridiculous. And it’s not even like you’re paying for quality, there are so many complaints about both minor and major issues.

maerch · 2025-10-09T13:40:24 1760017224

People being prevented from doing their job because of code formatting? In my nearly 20 years of development, that statement was indeed true, but only before the age of formatters. Back then, endless hours were spent on recurring discussions and nitpicky stylistic reviews. The supposed gains were minimal, maybe saving a few seconds parsing a line faster. And if something is really hard to read, adding a prettier-ignore comment above the lines works wonders. The number of times I’ve actually needed it since? Just a handful.

Code style is a Pareto-optimal problem space: what one person finds readable may look like complete chaos to someone else. There’s no objective truth, and that’s why I believe that in a project involving multiple people, spending time on this is largely a waste of time.

maerch · 2025-10-08T04:50:35 1759899035

> My experience is it often generates code that is subtlety incorrect. And I'll waste time debugging it.

> […]

> Or it'll help me debug my code and point out things I've missed.

I made both of these statements myself and later wondered why I had never connected them.

In the beginning, I used AI a lot to help me debug my own code, mostly through ChatGPT.

Later, I started using an AI agent that generated code, but it often didn’t work perfectly. I spent a lot of time trying to steer the AI to improve the output. Sometimes it worked, but other times it was just frustrating and felt like a waste of time.

At some point, I combined these two approaches: I cleared the context, told the AI that there was some code that wasn’t working as expected, and asked it to perform a root cause analysis, starting by trying to reproduce the issue. I was very surprised by how much better the agent became at finding and eventually fixing problems when I framed the task from this different perspective.

Now, I have commands in Claude Code for this and other due diligence tasks, and it’s been a long time since I last felt like I was wasting my time.

maerch · 2025-10-02T19:15:36 1759432536

I still have a bad taste in my mouth after all those GPT-5 hype articles that claimed the model was just one step away from AGI.

gardnr · 2025-10-02T20:52:47 1759438367

TBF, they all believed that scaling reinforcement learning would achieve the next level. They had planned to "war-dial" reasoning "solutions" to generate synthetic datasets which achieved "success" on complex reasoning tasks. This only really produced incremental improvements at the cost of test-time compute.

Now Grok is publicly boasting PhD level reasoning while Surge AI and Scale AI are focusing on high quality datasets curated by actual PhD humans.

Surge AI is boasting $1B in revenue, and I am wondering how much of that was paid in X.ai stock: https://podcasts.apple.com/us/podcast/the-startup-powering-t...

In my opinion the major advancements of 2025 have been more efficient models. They have made smaller models much, much better (including MoE models) but have failed to meaningfully push the SoTA on huge models; at least when looking at the USA companies.

ACCount37 · 2025-10-02T22:38:19 1759444699

Raw model size is still pegged by the hardware.

You can try to build a monster the size of GPT-4.5, but even if you could actually make the training stable and efficient at this scale, you still would suffer trying to serve it to the users.

Next generation of AI hardware should put them in reach, and I expect that model scale would grow in lockstep with new hardware becoming available.

svachalek · 2025-10-02T21:23:11 1759440191

Same, qwen3 omni blows my mind for what a 30b-A3b model can do. I had a video chat with it and it correctly identified plant species I showed it.

adastra22 · 2025-10-02T23:32:09 1759447929

Without defining “AGI” that’s always true, and trivially so.

maerch · 2025-10-02T16:08:36 1759421316

> The agent follows references like a human analyst would. No chunks. No embeddings. No reranking. Just intelligent navigation.

I think this sums it up well. Working with LLMs is already confusing and unpredictable. Adding a convoluted RAG pipeline (unless it is truly necessary because of context size limitations) only makes things worse compared to simply emulating what we would normally do.