> Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.
I think you missed my point.
If I'm inspecting code from another human, I'm going to make assumptions about the kinds of errors they're gonna make. There's probably obvious dumb stuff I won't look for because a human would never typically make certain classes of mistake. They're the self-driving car equivalent of driving into the back of a stopped semi truck because it was mistaken for a billboard, an error no human of sound mind and body would make.
So if I'm inspecting code written by a computer, I'll either 1) make those same assumptions and then run the risk of missing unexpected problems in the code, or 2) I'm gonna be overly cautious (because I don't trust the machine) and will examine the code with a fine tooth comb, which will take a great deal more time.
Based on my experience with Autopilot and Copilot, I think this is way less of a problem in code.
You can put code mistakes on a gradient, from subtle to obvious. Obvious bugs are like when the LLM finds a pattern and repeats it for 100 lines. Subtle mistakes are like misnaming a variable so you use one left over from earlier, not the correct one.
Obvious mistakes are easy to catch because they’re obvious. The LLM makes more of those. I think because of the way LLMs work, I have never seen Copilot make a subtle mistake that I wouldn’t expect of a person. People are so good at making surprising bugs, it’s really hard for Copilot to beat it.
Me too. It's an empirical question to be answered by those who will dare to try.
> It's kinda like self-driving cars
Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.
> more formally specify the problem that an LLM is being asked to solve.
That would be a great direction to explore.