What’s 300k tokens in terms of lines of code? Most codebases I’ve worked on prof...

simonw · 2025-03-28T13:35:06 1743168906

We hit "can actually be useful answering questions about it" within the last ~6 months with the introduction of "reasoning" models with 100,000+ token contest limits (and the aforementioned Gemini 1 million/2 million models).

The "reasoning" thing is important because it gives models the ability to follow execution flow and answer complex questions that down many different files and classes. I'm finding it incredible for debugging, eg: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8...

I built a files-to-prompt tool to help dump entire codebases into the larger models and I use it to answer complex questions about code (including other people's projects written in languages I don't know) several times a week. There's a bunch of examples of that here: https://simonwillison.net/search/?q=Files-to-prompt&sort=dat...

jjani · 2025-03-28T16:07:52 1743178072

How much lines of context and understanding can we, human developers, keep in our heads, taken into account and refer to when implementing something?

Whatever the amount may be, it definitely fits into 300k tokens.

mplanchard · 2025-03-29T02:04:54 1743213894

After more than a few years working on a codebase? Quite a lot. I know which interfaces I need and from where, what the general areas of the codebase are, and how they fit together, even if I don’t remember every detail of every file.