If, as in this paper, we allow ourselves to set the kernel after seeing the data, then the statement in the title is trivial: if my learning algorithm outputs function f, I can take the kernel K(x,x')=f(x)*f(x').
The result is interesting insofar as the path kernel is interesting, which requires some more thought.
If I'm understanding correctly, it doesn't just set the kernel after seeing the data, but also after training the entire model, because the path kernel can't be defined without the optimization process to define the path.
I can't tell if this paper is a useful insight or not.
The result is interesting insofar as the path kernel is interesting, which requires some more thought.