Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the bringing a tank to a knife fight kind of way, could this be optimized to run on a GPU? Load the contents then do an "and" across the whole contents in parallel, and then sum the whitespaces?


These benchmarks are on 92 million byte files so we're into the range where bringing a tank is fair (and worth the startup cost).


I doubt that you can make it faster on the GPU than on CPU when utilizing SIMD, reason being that you are actually doing something close to trivial upon looking at each byte in sequence. So you transfer it from CPU memory to GPU memory in order to do almost nothing with it.


I've got it working on a T4 via Google Colab. The PDF takes 178 milliseconds to the 206 listed in the readme for the C version, so 15%?

https://github.com/fragmede/wc-gpu/blob/main/wc_gpu.ipynb


It's only at a limit like that if you don't parallelize. And sure you could use more cores, but you can go a lot faster on 20% of a GPU than on 20% of your CPU cores.


I got nerd sniped into doing it. https://github.com/fragmede/wc-gpu




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: