Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> Now that the AI research field is coming around to the idea that something beyond deep learning is needed, the story matters less, and the benchmark, and future versions, can stand on their utility as a compass towards AGI.

How so? All the three top systems are deep neural net systems. The first place went to a system that, quoting from the "contributions" section of the paper, employed:

>> An automated data generation methodology that starts with 100-160 program solutions for ARC training tasks, and expands them to make 400k new problems paired with Python solutions

As I pointed out in another comment the top results in ARC have been achieved by ordinary, deep-learning, big-data, memorisation based approaches. You and fchollet (in these comments) try to claim otherwise but I don't understand why.

In fact, no, I understand why. I think fchollet wanted to place ARC as "not just a benchmark", the opposite of what tbalsam is asking for above. The motivation is solid: if we've learned anything in the last twenty-thirty years is that deep neural nets are very capable at beating benchmarks. For any deep neural net model that beats a benchmark though the question remains whether it can do anything else besides. Unfortunately, that is not a question that can be answered by beating yet another benchmark.

And here we are now, and the first place in the current ARC challenge goes to a deep neural net system trained on a synthetically augmented dataset. The right thing to do now would be to scale back the claims about the magickal AGI-IQ test with unicorns, and accept that your benchmark is just not any different than any other previous AI benchmark, that it is not any more informative than any other benchmark, and that a completely different kind of test of artificial intelligence is needed.

There is after all such a thing as scientific integrity. You make a big conjecture, you look at the data, realise that you're wrong, accept it, and move on. For example the authors of GLUE did that (as in SUPERGLUE). The authors of the Winograd Schema Challenge did that. You should follow their examples.



> realise that you're wrong, accept it, and move on

What do you think about limiting the submission size? Kaggle does this sometimes.

With a limit like 0.1-1MB (compressed), you are basically saying: "Give me sample-efficient learning algorithms, not pretrained models."


That's fine if you want to measure sample efficiency, but ARC-AGI is supposed to measure progress towards AGI.


> That's fine if you want to measure sample efficiency, but ARC-AGI is supposed to measure progress towards AGI.

On the Measure of Intelligence defines intelligence as skill-acquisition efficiency, I believe, where efficiency is with respect to whatever is the limiting factor. For each ARC task, the primary limiting factor is the number of samples in it. And the skill here is your ability to convert inputs into the correct outputs. In other words, in this context, intelligence is sample-efficiency, as I see it.


Is that what fchollet is claiming?


Not sure. But I think this follows logically from the definition of intelligence he is using. Also, see II.2.2 in the paper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: