Probably easiest to just calculate the inverse? You're rendering the pixel that the original one would be transformed to, so you just have to find the original one in the sampled texture. Maybe just doing the same sin/cos vector but with negative angle.
Yes, you can scan e.g. horizontally (x axis first) through the target image, but for the source image you will have to do a memory lookup with bad locality.
The texture cache tries to guarantee that every pixel is only read from DRAM once, and the texture sampler does bilinear sampling "for free". So a sequential copy and a "rotation copy" should take about the same time.
Well you don't need to scan, each SIMD processes a pixel in paralel and they spit out the whole thing simultaneously in the end, at least according to my understanding. You'd also presumably need to cache the entire texture that's being rendered anyway, so cache misses due to locality problems likely aren't huge.
Textures usually reside entirely in fast GPU accessible memory, but cache misses during sampling can still be a problem (so entirely random access across the whole texture is still worse then accessing nearby pixels). The texture data is 'scrambled' in memory for better 2D spatial locality though, so once in cache, access of pixel groups which are close together both horizontally or vertically is fast.
(disclaimer: I'm not a hardware guy, and I also don't know if this applies to all pixel formats and different GPU architectures - texture cache details are also notoriously bad documented by GPU vendors).