You know some people grow up in untrustworthy environments and auto didact their way to something like first principles thinking and depending on things shaking out you might only believe what you've personally seen with your own eyes. And well, earth looks pretty flat in daily life.
Forks are easy for Github to shut down simultaneously. What you really want is to upload the code as a new repo (ideally a different name from the original one). But it shouldn't be too hard in practice to detect uploading the same codebase as one that's taken down if that's desired.
it's just not binary. today's world is dominated by capitalistic competition and a lot of people earn a living by competing with their labor. If AI + robots can do the labor better, cheaper, faster, most (90%+) of today's jobs are gone without obvious replacement.
"In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh)."
I find that very surprising. This problem seems out of reach 3 months ago but now the 3 frontier models are able to solve it.
Is everybody distilling each others models? Companies sell the same data and RL environment to all big labs? Anybody more involved can share some rumors? :P
I do believe that AI can solve hard problems, but that progress is so distributed in a narrow domain makes me a bit suspicious somehow that there is a hidden factor. Like did some "data worker" solve a problem like that and it's now in the training data?
Yes there's a whole ecosystem of companies that create and sell RL gyms to AI labs and of course they develop their own internally too. You don't hear much about this ecosystem because RL at scale is all private. Nearly no academic research on it.
A lot of this is probably just throwing roughly equal amounts of compute at continuous RLVR training. I'm not convinced there's any big research breakthrough that separates GPT 5.4 from 5.2. The diff is probably more than just checkpoints but less than neural architecture changes and more towards the former than the latter.
I think it's just easy to underestimate how much impact continuous training+scaling can have on the underlying capabilities.
Is it possible the AI labs are seeding their models with these solved problems? Like, if I was Sam Altman with a bazillion dollars of investment I would pay some mathematicians to solve some of these problems so that the models could "solve" them later on. Not that I think it's what's happening here of course...
But it is pretty funny how 5.4 miscounted the number of 1's in 18475838184729 on the same day it solved this.
As a cheap user that only uses the 20$ month subscriptions I started with Claude Code as main & Codex as backup when the 5 hour quota was exhausted.
Then I saw that Codex worked better for me and cancelled my Claude Code subscription. And now for my moderate use (4-5 hours a day with no parallel agents) I have enough with Codex $20 and AMP free if I want to save some weekly quota.
But honestly I usually have enough usage to last the full week without using AMP.
I liked how it read. Not as a perfectly thought out post but more an ongoing conversation.
These are confusing times for engineers as the automators can now automate themselves away at even greater speed. Reminding ourselves to play positive sum games seems relevant.
The cake is too small to divide with humans and AI. We all feel that. Time to make more cakes :)
tldr: the author argues it is closer to costing 500 USD per month IF a user hits their weekly rate limits every week.
Which is probably a lot more correct than other claims. However it's also true that anybody who has to use the API might pay that much, creating a real cost per token moat for Anthropics Claude code vs other models as long as they are so far ahead in terms of productivity.
reply