I found it very weird that the SLIDE algorithm from early 2019 isn’t mentioned. ...

quotemstr · on Dec 20, 2020

AIUI, that only works on sparse networks

mlthoughts2018 · on Dec 20, 2020

But there’s also been a lot of research suggesting most SOTA dense networks are arbitrarily replicatable with sparse networks, and may even be better in the sense of less overfitting. Perhaps things like GPT are still an exception, but for most applications SLIDE should work to train networks just as effective as naively specified dense architectures.

cinquemb · on Dec 21, 2020

> But there’s also been a lot of research suggesting most SOTA dense networks are arbitrarily replicatable with sparse networks

I'm not sure if its related, but would this work kind of how armadillo can do singular value decomp [0] of a matrix by embedding arbitrary n by m matrix X in a higher dimensional n+m by n+m null matrix M?

[0] http://arma.sourceforge.net/docs.html#svds

quotemstr · on Dec 20, 2020

Yeah. I think part of the problem is just that SLIDE represents a Kuhnesque paradigm shift and these things take time. I really want to play with SLIDE myself but just haven't had a chance.