Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes.. if you're doing something that is just doing the same calculation on a whole lot of different inputs, then GPUs absolutely eat that for breakfast.

In my game, I have a complex procedural generation process that occurs while loading a game. It's not a graphics process, so I originally did it on the CPU. It originally took about three seconds to build the data in parallel across seven background CPU threads on my quad-core processor. But testers who were using low-end dual-core i5s only had one background thread to do that same calculation, and typically reported that the procgen took multiple minutes to complete.

After spending a week refactoring the algorithm to do the same calculation on the GPU instead of the CPU (basically by pretending it was a rendering calculation and writing results out into a "texture" that we could read the results from), calculation times dropped from seconds or even minutes to just fractions of a millisecond, even on low-spec machines.

The calculation that I previously had to hide behind a loading screen was now quick enough that I could freely do it at runtime without even causing a blip to the frame rate. If you've got a problem that they can handle, GPUs are kind of astonishingly fast; even the (by modern standards) low-end ones.



Very true!

If you look at what most graphic pipelines for 1 pixel the amount of calculations for that 1 pixel is not very much maybe a few dozen instructions. But at 1080 that is a lot more (about 2million times more). GPUs are exceedingly good at doing semi small programs over and over across 2000+ compute units. At best in a CPU you may get 64 if you have a super nice top of the line CPU (reality is 2 or 4). The graphs where that change over happens is going to vary considerably across workloads and instructions used. In most cases currently it heavily favors the GPU. Throw in branching or something like that and CPU might become more favorable. But you still have to try it out.

In the case of this article. They are using hashing/caching which, yeah, should produce a fairly nice speedup. Basically the old speedup trick of do the work once and keep the result. But that probably might not translate very nicely GPU. Oh you could get it to run but it may not be as performant. In the game world it would be like what we used to do with sin/cos and just have a lookup table instead of calling the instruction. We just precalled it and had a copy laying around in an array for the most common cases. So it was just a memory lookup and very little compute and keeping the cached result. BUT that does come at a cost if you have to branch on miss.

Now if you could combine the two ideas. Maybe with some sort of mask to the GPU to say 'do not do any work here as it is done already and work on something else' and pre fill stuff in this could be an even more interesting idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: