In the bringing a tank to a knife fight kind of way, could this be optimized to run on a GPU? Load the contents then do an "and" across the whole contents in parallel, and then sum the whitespaces?
I doubt that you can make it faster on the GPU than on CPU when utilizing SIMD, reason being that you are actually doing something close to trivial upon looking at each byte in sequence. So you transfer it from CPU memory to GPU memory in order to do almost nothing with it.
It's only at a limit like that if you don't parallelize. And sure you could use more cores, but you can go a lot faster on 20% of a GPU than on 20% of your CPU cores.