I understand how developers can come to this conclusion if they're only using lo...

I understand how developers can come to this conclusion if they're only using local models that can run on consumer GPUs since there's a time cost to prompting and the output is fairly low quality with a higher probability of errors and hallucinations.

But I don't understand how you can come to this conclusion when using SOTA models like Claude Sonnet 3.7, it's response has always been useful and when it doesn't get it right first time you can keep prompting it with clarifications and error responses. On the rare occasion it's unable to get it right, I'm still left with a bulk of useful code that I can manually fix and refactor.

Either way my interactions with Sonnet is always beneficial. Maybe it's a prompt issue? I only ask it to perform small, specific deterministic tasks and provide the necessary context (with examples when possible) to achieve it.

I don't vibe code or unleash an LLM on an entire code base since the context is not large enough and I don't want it to refactor/break working code.