Probably easiest to just calculate the inverse? You're rendering the pixel that ...

amelius · on Jan 23, 2023

Yes, you can scan e.g. horizontally (x axis first) through the target image, but for the source image you will have to do a memory lookup with bad locality.

WithinReason · on Jan 23, 2023

Image lookups with bad locality is exactly what a GPU's texture pipeline is for

amelius · on Jan 23, 2023

How does this pipeline deal with the locality issue, assuming the texture doesn't fit in the cache?

WithinReason · on Jan 24, 2023

The texture cache tries to guarantee that every pixel is only read from DRAM once, and the texture sampler does bilinear sampling "for free". So a sequential copy and a "rotation copy" should take about the same time.

moffkalast · on Jan 23, 2023

Well you don't need to scan, each SIMD processes a pixel in paralel and they spit out the whole thing simultaneously in the end, at least according to my understanding. You'd also presumably need to cache the entire texture that's being rendered anyway, so cache misses due to locality problems likely aren't huge.

flohofwoe · on Jan 23, 2023

Textures usually reside entirely in fast GPU accessible memory, but cache misses during sampling can still be a problem (so entirely random access across the whole texture is still worse then accessing nearby pixels). The texture data is 'scrambled' in memory for better 2D spatial locality though, so once in cache, access of pixel groups which are close together both horizontally or vertically is fast.

(disclaimer: I'm not a hardware guy, and I also don't know if this applies to all pixel formats and different GPU architectures - texture cache details are also notoriously bad documented by GPU vendors).

amelius · on Jan 23, 2023

Ok, I'm guessing from this that it only works for relatively small images (?)