If, as in this paper, we allow ourselves to set the kernel after seeing the data... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		how_strange on Dec 6, 2020 \| parent \| context \| favorite \| on: Every Model Learned by Gradient Descent Is Approxi... If, as in this paper, we allow ourselves to set the kernel after seeing the data, then the statement in the title is trivial: if my learning algorithm outputs function f, I can take the kernel K(x,x')=f(x)*f(x'). The result is interesting insofar as the path kernel is interesting, which requires some more thought.

moultano on Dec 6, 2020 [–]

If I'm understanding correctly, it doesn't just set the kernel after seeing the data, but also after training the entire model, because the path kernel can't be defined without the optimization process to define the path.

I can't tell if this paper is a useful insight or not.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact