Note that this isn't improving the LLM itself, but the software glue around it (...

Note that this isn't improving the LLM itself, but the software glue around it (i.e. agentic loops, tools, etc). The fact that using the same LLM got ~20% increase on the aider leaderboard speaks more about aider as a collection of software glue, than it does about the model.

I do wonder though if big labs are running this with model training episodes as well.