Well, at all but a finite number of points (specifically all but one point), there is a neighborhood of that point at which ReLU matches a linear function...
In one sense, that seems rather close to being linear. If you take a random point (according to a continuous probability distribution) , then with probability 1, if look in a small enough neighborhood of the selected point, it will be indistinguishable from linear within that neighborhood.
And, for a network made of ReLU gates and affine maps, still get that it looks indistinguishable from affine on any small enough region around any point outside of a set of measure zero.
So...
Depends what we mean by “almost linear” I think. I think one can make a reasonable case for saying that, in a sense it is “almost linear”.
But yes, of course I agree that in another important sense, it is far from linear. (E.g. it is not well approximated by any linear function)
Yeah, and we have more than measure zero -- the subsets of the input space on which a fully ReLU MLP is linear are Boolean combinations of hyperspaces. I was coming at it from the heuristic that if you can triangulate a space into a finite number of easily computable convex sets such that the inside of each one has some trait, then it's as good as saying that the space has this trait. But of course this heuristic doesn't always have to be true, or useful.
In one sense, that seems rather close to being linear. If you take a random point (according to a continuous probability distribution) , then with probability 1, if look in a small enough neighborhood of the selected point, it will be indistinguishable from linear within that neighborhood.
And, for a network made of ReLU gates and affine maps, still get that it looks indistinguishable from affine on any small enough region around any point outside of a set of measure zero.
So... Depends what we mean by “almost linear” I think. I think one can make a reasonable case for saying that, in a sense it is “almost linear”.
But yes, of course I agree that in another important sense, it is far from linear. (E.g. it is not well approximated by any linear function)