Hacker Newsnew | past | comments | ask | show | jobs | submit | sothatsit's commentslogin

Claude Code captures this locally, not in version control alongside commits.

I wonder how difficult it would be for Claude Code to have such a feature in a future release.

I think this originated from old arguments that say that the total _cumulative_ time spent reading code will be higher than the time spent writing it. But then people just warped it in their heads that it takes more time to read and understand code than it takes to write it in general, which is obviously false.

I think people want to believe this because it is a lot of effort to read and truly understand some pieces of code. They would just rather write the code themselves, so this is convenient to believe.


I also like asking the agent how we can update the AGENTS.md to avoid similar mistakes going forward, before starting again.

I think this is just worded in a misleading way. It’s available to all users, but it’s not included as part of the plan.

At 6x the cost, and it requiring you to pay full API pricing, I don’t think this is going to be a concern.

If it could help avoid you needing to context switch between multiple agents, that could be a big mental load win.

Roon said as much here [0]:

> codex-5.2 is really amazing but using it from my personal and not work account over the weekend taught me some user empathy lol it’s a bit slow

[0] https://nitter.net/tszzl/status/2016338961040548123


I see Anthropic says so here as well: https://x.com/claudeai/status/2020207322124132504

There are a lot of knobs they could tweak. Newer hardware and traffic prioritisation would both make a lot of sense. But they could also lower batching windows to decrease queueing time at the cost of lower throughput, or keep the KV cache in GPU memory at the expense of reducing the number of users they can serve from each GPU node.

I think it's just routing to faster hardware:

H100 SXM: 3.35 TB/s HBM3

GB200: 8 TB/s HBM3e

2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).

FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.


Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.

Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.

[0] https://nitter.net/dylan522p/status/2020302299827171430


Could the proxy place further restrictions like only replacing the placeholder with the real API key in approved HTTP headers? Then an API server is much less likely to reflect it back.

It can, yes. (I don't know how Deno's work, but that's how ours works.)

It’s like in Anthropic’s own experiment. People who used AI to do their work for them did worse than the control group. But people who used AI to help them understand the problem, brainstorm ideas, and work on their solution did better.

The way you approach using AI matters a lot, and it is a skill that can be learned.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: