I don't think the output of the conversion is guaranteed to be equivalent to a h...

I don't think the output of the conversion is guaranteed to be equivalent to a hyperplane learned by an SVM.

I didn't have time to read the paper, but reading the abstract I don't see a claim that the gradient descent model approximated by a kernel machine is equivalent to an optimal fit obtained by SVM maximum margin hyperplane fitting.

I assume one likely ends up with different hyperplane fits from converting a NN/gradient-desc-learned model to kernel machine vs learning a kernel machine directly via SVM learning.