I think this originated from old arguments that say that the total _cumulative_ time spent reading code will be higher than the time spent writing it. But then people just warped it in their heads that it takes more time to read and understand code than it takes to write it in general, which is obviously false.
I think people want to believe this because it is a lot of effort to read and truly understand some pieces of code. They would just rather write the code themselves, so this is convenient to believe.
There are a lot of knobs they could tweak. Newer hardware and traffic prioritisation would both make a lot of sense. But they could also lower batching windows to decrease queueing time at the cost of lower throughput, or keep the KV cache in GPU memory at the expense of reducing the number of users they can serve from each GPU node.
2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).
FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.
Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.
Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.
Could the proxy place further restrictions like only replacing the placeholder with the real API key in approved HTTP headers? Then an API server is much less likely to reflect it back.
It’s like in Anthropic’s own experiment. People who used AI to do their work for them did worse than the control group. But people who used AI to help them understand the problem, brainstorm ideas, and work on their solution did better.
The way you approach using AI matters a lot, and it is a skill that can be learned.
reply