It depends what you're doing. If you need to write your own kernels, or have small networks where the framework overhead is significant, then it's way faster than Tensorflow (unless you implement your own Tensorflow OP in C++/Cuda, but that's way more painful than just implementing it directly in Flux/Julia). It's hence quite nice for research on new architectures. Flux's autodiff also handles more language features than TF or PyTorch's.
> unless you implement your own Tensorflow OP in C++/Cuda
I don't know exactly what 'OP' means, but there are other ways to do ML in for example C++. I have some good experience with dlib, PyTorch's libtorch is on my todo list.
>I don't know exactly what 'OP' means, but there are other ways to do ML in for example C++.
I think Flux really excels when you're trying to do ML as "differentiable programming" (e.g. model-based reinforcement learning), because it can differentiate so much control flow via Zygote, which hooks into the compiler and differentiates the AST: https://github.com/FluxML/Zygote.jl