Hacker Newsnew | past | comments | ask | show | jobs | submit | batshit_beaver's commentslogin

You mean the company where engineers ask chat bots to write chess games in their spare time in order to hit their AI usage requirements? That Meta?

idk why you bring this up. this is irrelevant to whether CC actually works at big corps

I missed that, source?

That might require thinking instead of feeling.

Adding this to my #owned compilation.

- Reddit Ralph


Because you trust that your dependencies are not vibe coded and have been reviewed by humans.

Stop trusting any dependency now.

except they are vibe-or-not coded by some dude in Reno NV who wouldn’t pass a phone screen where you work

I'd trust that dude over professional leetcoders any day.

But you're right that trust is a complicated thing and often misplaced. I think as an industry we're always reevaluating our relationship with OSS, and I'm sure LLMs will affect this relationship in some way. It's too early to tell.


I find this relationship fascinating. since the OSS vast majority of the developers will not hesitate to pull in library X or framework Y knowing really nothing about it, who are developers, what is the quality of it, what is their release process, qa etc etc... The first thing I do now as a "senior" for decades when I get approached with "we should consider using ____" is to send them to their issues page ( e.g. https://github.com/oven-sh/bun/issues ) and then be like "spend 60-90 minutes minimum here reviewing the issues - then come back and tell me whether or not the inclusion of this is something we should consider." and yet, now with LLMs there are sooooooooo many comments on HN like "oh they must be supervised, who knows what they will be doing etc..." - gotta supervise them but some mate in Boise is all good, hopefully someone else will review his stuff that is going into your next release ...

You literally cannot, since ANY changes to code tend to introduce unintended (or at least not explicitly requested) new behaviors.

Eventual convergence? Assuming each defect fix has a 30% chance of introducing a new defect, we keep cycling until done?

Assuming you can catch every new bug it introduces.

Both assumptions being unlikely.

You also end up with a code base you let an AI agent trample until it is satisfied; ballooned in complexity and redudant brittle code.


You can have an AI agent refactor and improve code quality.

But, have you any code that has been vetted and verified to see if this approach works? This whole Agentic code quality claim is an assertion, but where is the literal proof?

If it can be trained with reinforcement learning then it will happen

Did we have code quality before llms?

Funnily enough I've literally never seen anyone demo this, despite all the other AI hype. It's the one thing that convinces me they're still behind.

It’s agents all the way down - until you have liability. At some point, it’s going to be someone’s neck on the line, and saying “the agents know” isn’t going to satisfy customers (or in a worst case, courts).

> until you have liability

And are you thinking this going to start happening at some point or what?

The letters I get every other month telling me I now have free credit monitoring because of a personal info breach seems to suggest otherwise.


A firm has very different amounts of time, ability and money to spend on following up on broken contracts.

Sure it can. It's not like humans aren't already deflecting liability or moving it to insurance agencies.

> It's not like humans aren't already deflecting liability

They attempt to, sure, but it rarely works. Now, with AI, maybe it might, but that's sort of a worse outcome for the specific human involved - "If you're just an intermediary between the AI and me, WTF do I need you for?"

> or moving it to insurance agencies.

They aren't "moving" it to insurance companies, they are amortising the cost of the liability at a small extra cost.

That's a big difference.


At some point, the risk/return calculus becomes too expensive for insurance companies.

Usually thats after the premiums become too high for most people to pay.


Just today I had an agent add a fourth "special case" to a codebase, and I went back and DRY'd three of them.

Now I used the agent to do a lot of the grunt work in that refactor, but it was still a design decision initiated by me. The chatbot, left unattended, would not have seen that needed to be done. (And when, during my refactor, it tried to fold in the fourth case I had to stop it.)

(And for a lot of code, that's ok - my static site generator is an unholy mess at this point, and I don't much care. But for paid work...)


That's assuming that each fix can only introduce at most one additional defect, which is obviously untrue.

Why would it converge?

The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.

If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.


As long as we're inventing numbers, what if it's a 90% chance?

What if it's a 200% chance, and every fix introduces multiple defects?


Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.

I’ve had mine on a Ralph loop no problem. Just review the PR..

Which still means a single person with Claude can clear a queue in a day versus a month with a traditional team.

Your example must have incredible users or really trivial software.

> I'm all for AI and it's great for things like copywriting, brainstorming and code generation

It's funny how the assumption is always that LLMs are very useful in an industry other than your own.


I mean they are not wrong.

For all the whinging about bugs and errors around here the software industry in general (some niche sub-fields excepted) long ago decided 80% is good enough to ship and we will figure the rest out later. This entire site is based on startup culture which largely prided itself on MVP moonshots.

Plus plenty of places are perfectly fine with tech dept and the AI fire hose is effectively tech debt on steroids but while it creates it at scale it can also help in understanding it.

It is is own panacea in a way.

I think it is gonna be a while before the industry figures out how to handle this better so might as well just ride the wave and not worry too much about it in software.

Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.


> long ago decided 80% is good enough to ship and we will figure the rest out later

Sure, we can eat that 80% if we made the debt and know where the bodies are buried. When AI does it, it’s more like “50% is good enough and hopefully someone smart enough can fix it up to 60% when it breaks”, which inevitably means we get it to 55%.

> while it creates it at scale it can also help in understanding it.

This feels like cope, and I’m not trying to be snarky. I also know that I had to train my brain to skip google’s “AI Summary” on searches because I’ve had a handful of wrong answers from it - not technically correct, not correct with caveats, just flat out wrong. So sure, AI could make a bigger mess and then we could trust it to help sort it out, but even if it finds three real problems and one non-existent one, it has still made even more work to sort out.

> Still software is not medicine even if software is required in basically every industry now. It should more conservative and wait till things settle down before jumping in.

Agree. We had a big company meeting about becoming an AI-focused company (we do healthcare-related software) and I’m honestly a little worried - I don’t use any AI in my work, and when I asked a colleague who implemented our new build process for help with migrating my own repo they said “I don’t know, I asked GPT to do it”. And that’s why the pipeline has a mega-long ternary for the name which doesn’t resolve, so all of our runs are titled “if [[ $matrixType === …” but they’re using AI, and they’re going to be celebrated for it


That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.

But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.



Are you also an LLM? Do objects often begin rotating when you're only holding them with one hand?


Not unlikely that you're talking to a lot of AI-based AI boosters. It's easier to create astroturfed comments with chatbots than fixing the inherent problems.


I always like to ask AI to generate a middle aged blond man with gray hair. Turns out that all models with gray have black roots.

https://chatgpt.com/share/69bcd01a-a750-800d-95f5-3b840b9ee2...

https://gemini.google.com/share/edc223bb6291 (the try again gave a woman, oops)

Even Midjourney couldn't do it.


Nice. My test was always a blond bald guy. It always adds hair. If you ask for bald you get a dark haired bald guy, if you add blond, you can't get bald because I guess saying the hair color implies hair (on the head), while you may just want blonde eyebrows and/or blond stubble.


He didn't say he won.


Did they even enjoy it though beyond the novelty!?


Everyone wins when you can wololo with others on the internet at 30,000ft.


This


Legalese already exists in software engineering. Several dialects of it, in fact. We call them programming languages.



In theory, there's no difference between theory and practice.

In practice...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: