What all top models do is recombine at test time the knowledge they already have...

YeGoblynQueenne · on Dec 6, 2024

>> So they all possess Core Knowledge priors.

Do you mean the ones from your white paper? The same ones that humans possess? How do you know this?

>> The key bit isn't the data augmentation but the TTT.

I haven't had the chance to read the papers carefully. Have they done ablation studies? For instance, is the following a guess or is it an empirical result?

>> For instance, if you drop the TTT component you will see that these large models trained on millions of synthetic ARC-AGI tasks drop to <10% accuracy.

optimalsolver · on Dec 6, 2024

>This demonstrates empirically that ARC-AGI cannot be solved purely via memorization and interpolation

Now that the current challenge is over, and a successor dataset is in the works, can we see how well the leading LLMs perform against the private test set?

tuukkah · on Dec 6, 2024

I think the "semi-private" numbers here already measure that: https://arcprize.org/2024-results

For example, Claude 3.5 gets 14% in semi-private eval vs 21% in public eval. I remember reading an explanation of "semi-private" earlier but cannot find it now.