I'd trust that dude over professional leetcoders any day.
But you're right that trust is a complicated thing and often misplaced. I think as an industry we're always reevaluating our relationship with OSS, and I'm sure LLMs will affect this relationship in some way. It's too early to tell.
I find this relationship fascinating. since the OSS vast majority of the developers will not hesitate to pull in library X or framework Y knowing really nothing about it, who are developers, what is the quality of it, what is their release process, qa etc etc... The first thing I do now as a "senior" for decades when I get approached with "we should consider using ____" is to send them to their issues page ( e.g. https://github.com/oven-sh/bun/issues ) and then be like "spend 60-90 minutes minimum here reviewing the issues - then come back and tell me whether or not the inclusion of this is something we should consider." and yet, now with LLMs there are sooooooooo many comments on HN like "oh they must be supervised, who knows what they will be doing etc..." - gotta supervise them but some mate in Boise is all good, hopefully someone else will review his stuff that is going into your next release ...
But, have you any code that has been vetted and verified to see if this approach works? This whole Agentic code quality claim is an assertion, but where is the literal proof?
It’s agents all the way down - until you have liability. At some point, it’s going to be someone’s neck on the line, and saying “the agents know” isn’t going to satisfy customers (or in a worst case, courts).
> It's not like humans aren't already deflecting liability
They attempt to, sure, but it rarely works. Now, with AI, maybe it might, but that's sort of a worse outcome for the specific human involved - "If you're just an intermediary between the AI and me, WTF do I need you for?"
> or moving it to insurance agencies.
They aren't "moving" it to insurance companies, they are amortising the cost of the liability at a small extra cost.
Just today I had an agent add a fourth "special case" to a codebase, and I went back and DRY'd three of them.
Now I used the agent to do a lot of the grunt work in that refactor, but it was still a design decision initiated by me. The chatbot, left unattended, would not have seen that needed to be done. (And when, during my refactor, it tried to fold in the fourth case I had to stop it.)
(And for a lot of code, that's ok - my static site generator is an unholy mess at this point, and I don't much care. But for paid work...)
The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.
If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.
Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.
For all the whinging about bugs and errors around here the software industry in general (some niche sub-fields excepted) long ago decided 80% is good enough to ship and we will figure the rest out later. This entire site is based on startup culture which largely prided itself on MVP moonshots.
Plus plenty of places are perfectly fine with tech dept and the AI fire hose is effectively tech debt on steroids but while it creates it at scale it can also help in understanding it.
It is is own panacea in a way.
I think it is gonna be a while before the industry figures out how to handle this better so might as well just ride the wave and not worry too much about it in software.
Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.
> long ago decided 80% is good enough to ship and we will figure the rest out later
Sure, we can eat that 80% if we made the debt and know where the bodies are buried. When AI does it, it’s more like “50% is good enough and hopefully someone smart enough can fix it up to 60% when it breaks”, which inevitably means we get it to 55%.
> while it creates it at scale it can also help in understanding it.
This feels like cope, and I’m not trying to be snarky. I also know that I had to train my brain to skip google’s “AI Summary” on searches because I’ve had a handful of wrong answers from it - not technically correct, not correct with caveats, just flat out wrong. So sure, AI could make a bigger mess and then we could trust it to help sort it out, but even if it finds three real problems and one non-existent one, it has still made even more work to sort out.
> Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.
Agree. We had a big company meeting about becoming an AI-focused company (we do healthcare-related software) and I’m honestly a little worried - I don’t use any AI in my work, and when I asked a colleague who implemented our new build process for help with migrating my own repo they said “I don’t know, I asked GPT to do it”. And that’s why the pipeline has a mega-long ternary for the name which doesn’t resolve, so all of our runs are titled “if [[ $matrixType === …” but they’re using AI, and they’re going to be celebrated for it
That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.
But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.
Not unlikely that you're talking to a lot of AI-based AI boosters. It's easier to create astroturfed comments with chatbots than fixing the inherent problems.
Nice. My test was always a blond bald guy. It always adds hair. If you ask for bald you get a dark haired bald guy, if you add blond, you can't get bald because I guess saying the hair color implies hair (on the head), while you may just want blonde eyebrows and/or blond stubble.
reply