Yes.. if you're doing something that is just doing the same calculation on a who...

sumtechguy · on April 9, 2021

Very true!

If you look at what most graphic pipelines for 1 pixel the amount of calculations for that 1 pixel is not very much maybe a few dozen instructions. But at 1080 that is a lot more (about 2million times more). GPUs are exceedingly good at doing semi small programs over and over across 2000+ compute units. At best in a CPU you may get 64 if you have a super nice top of the line CPU (reality is 2 or 4). The graphs where that change over happens is going to vary considerably across workloads and instructions used. In most cases currently it heavily favors the GPU. Throw in branching or something like that and CPU might become more favorable. But you still have to try it out.

In the case of this article. They are using hashing/caching which, yeah, should produce a fairly nice speedup. Basically the old speedup trick of do the work once and keep the result. But that probably might not translate very nicely GPU. Oh you could get it to run but it may not be as performant. In the game world it would be like what we used to do with sin/cos and just have a lookup table instead of calling the instruction. We just precalled it and had a copy laying around in an array for the most common cases. So it was just a memory lookup and very little compute and keeping the cached result. BUT that does come at a cost if you have to branch on miss.

Now if you could combine the two ideas. Maybe with some sort of mask to the GPU to say 'do not do any work here as it is done already and work on something else' and pre fill stuff in this could be an even more interesting idea.