Thank you for your answer. It appears to me that we are trying to achieve an alg...

duvenaud · on Dec 20, 2020

Thanks for your question! But as I said, no one is really worried about the asymptotic time complexity of reverse mode differentiation, although there is scope for improving constants). The main scope for improvement is in the space complexity.

There is a lot of work on trying to speed up optimization, for example the K-FAC optimizer by Roger Grosse that uses second-order gradient information in a scalable way.

The lottery ticket pruning strategies do reduce space complexity, but I think the main reason people are interested in it is to reduce training time complexity, or deployment memory requirements, but not so much training memory requirements.

As for whether memory-saving and time-saving approaches are disjoint, many methods (like checkpointing) introduce a tradeoff between time and space complexity, so no.

pretty_dumm_guy · on Dec 23, 2020

Thank you again for the clarifications. You have given me something to chew on over the holidays.

I wish you and your family a happy Christmas :)

sdenton4 · on Dec 20, 2020

(Lottery Ticket, to date, produces small networks ex post facto... You still have to train the giant network. There's also some indication that it's chancy on 'large' datasets+problems. https://arxiv.org/abs/1902.09574 )