No, they are able to detect errors when pointed at them but they have a lot of false positives... making them functionally useless for a large unknown codebase. They also can't build and run an exploit post-identification. Mythos can find vulnerabilities (purportedly) and actually validate them by building and running exploits. This makes it functional and usable for hacking.
The only significant difference between Mythos and the older open-weights models was that Mythos found all the bugs alone, while with the older models you had to run many of them in order to find all bugs, because each model found only a part of the bugs.
For the open weights models, we know the exact prompts that have been used to find the bugs. While the prompts had to be rather specific, a good bug-finding harness should be able to generate such prompts automatically, i.e. by running repeatedly a model while requesting to find various classes of bugs.
For Mythos, we do not know what prompts have been used, but Anthropic has admitted that the process was nothing like asking "find the bugs in this project". They have also run Mythos many times on each source file, starting with more generic prompts in order to identify whether a source file is likely to have bugs, and then following with more and more specific prompts, until eventually it became likely that a certain kind of bug exists, when Mythos was run one last time with a prompt that required the confirmation that the bug exists and the possible generation of an exploit or patch.
So Mythos must also be pointed to an error. Using it naively will not provide any results like those reported.
There is no doubt that both Mythos and GPT 5.5 are superior to older models, because you can use a single model and hope to have an adequate bug coverage. But the difference between them and older models has been exaggerated. If you run older models on your own hardware, you can afford to run many models many times on each file. A serious bug searching with Mythos or GPT 5.5 is likely to be very expensive, while likely to provide the same results in most cases.
They are bringing in $30B in revenue with 3X YoY growth. Why do you think it is a "jig"? I do think the US economy could implode, but thats because of war and wealth inequality in the midst of hyper-inflation. AI models aren't very useful when you have penniless consumers that can't buy the products they help build. All this is to say: the models are valuable, the companies building and providing them are very valuable.
The biggest risk to AI companies IMO is further optimization and distillation of the capabilities into smaller and more efficient models. The moat these companies have right now is that higher intelligence requires more specialized and expensive compute. If you can do that for cheap then it kind of negates their business model. Everything is moving fast, we also yet to see world models/embodied AI and how that impacts thing. I think we've reached the peak with regards to capabilities of pure text trained LLMs.
I don't find this paper very compelling. Obviously it would be fraud if the code generated simply escaped the harness vs solving the actual problem. I agree that theoretically models could learn to do that, and it is important to highlight, but my sense is that those entities reporting the benchmark scores would have an obligation to observe this behavior and re-consider the metrics they report. It is a bit like saying it's possible to cheat in football because the balls are deflatable. It matters, and some have done it, but it doesn't mean widespread cheating is taking place. The paper takes the tone that there is already a lot of cheating happening which I do not think is the case.
I think there are a lot of good answers here, but it really comes down to the type of content being stored and access patterns.
A database is a data structure with (generally) many small items that need to be precisely updated, read and manipulated.
A lot of files don't necessarily have this access pattern (for instance rendering a large video file) ... a filesystem has a generic access pattern and is a lower level primitive than a database.
For this same reason you even have different kinds of database for different types of access patterns and data types (e.g Elasticsearch for full text search, MongoDB for JSON, Postgres for SQL)
Filesystem is generic and low-level, database is a higher order abstraction.
What's your source for this? There isn't really a lot of credible, publicly available information on what you're saying... just anecdotes. In the India v. Pakistan conflict recently a French produced Indian Rafale was downed via a Chinese long range air-to-air missle (PL-15) from a a Chinese produced J-10 jet. Even if they don't have the same hit rate, you can buy 10x for the same price.
No one wants to liberate Iran. Israel just wants to continue committing genocide and apartheid without any opposition. Iran arms Hezbollah and Hamas, the main forms of Palestinian resistance. The whole point of this operation is to decimate those groups so ethnic cleansing can continue without any resistance. Israel could care less about the Irani people.
You are very naive if you think the IRGC truly killed 10's of thousands of it's own people. Israel openly talks about Mossad organizing and supporting the coup, and good old Donny has admitted they have given weapons to organized resistance.
I estimate that many of the death numbers come from armed resistance being killed by the IRGC, not ordinary peaceful protestors. I also think armed resistance killed many Irani citizens. There is obviously fog of war here. The thousands of deaths were likely inflated and obfuscated.
Look at the coups we have backed in the middle east (including formerly in Iran which is what originally led to the Islamic revolution) -- and you will see a pattern. Both US and Israel provide material support to groups like ISIS or actors like Bin Laden. An Al-Qaeda fighter is literally the head of Syria now thanks to Israel.
I don't love Hamas, IRGC or Hezbollah, I don't like their ideology. But it is myopic to think they exist in a vaccum.
I did a similar project but using 3D fractals I found on shadertoy feeding into ViTs. They are extremely simple iterative functions that produce a ton of scene like complexity.
I have a pet theory that the visual cortex when developing is linked to some kind of mechanism such as this. You just need proteins that create some sort of resonating signal that feed into the neurons as they grow (obviously this is hand-wavy) but similar feedback loops guide nervous system growth in Zebra fish for example.
It's better than using randomly initialized weights. It's more of a theoretical exercise to explore biology. When an infant is born maybe the visual cortex already has some notion of edge detectors etc. through a system such as this one despite never having really opened it's eyes.
reply