Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are many neat tricks for branchless algorithms. Once you get a feel for what makes them work, they're easy to invent on your own. Two representative examples:

    // branchless octree traversal
    i   = x <= t->mx;
    i <<= 1;
    i  |= y <= t->my;
    i <<= 1;
    i  |= z <= t->mz;
    t   = t->child[i];

    // unrolled branchless binary search of a 64-element array
    if (p[32] <= x) p += 32;
    if (p[16] <= x) p += 16;
    if (p[ 8] <= x) p +=  8;
    if (p[ 4] <= x) p +=  4;
    if (p[ 2] <= x) p +=  2;
    if (p[ 1] <= x) p +=  1;
    return p;
(In the binary search, you'd implement the if statements with conditional moves or predicated adds, depending on what your platform offers.)

Both of these examples replace branching with data-dependent array indexing, which is a recurring pattern.

Branchless algorithms are often a performance win on consoles and embedded systems. But they really shine when you're writing code for a wide-SIMD architecture with gather-scatter support, where divergent memory accesses are generally much less costly than divergent branches, GPUs being the most widespread example.



The second one has branches, perhaps you meant p += (p[32]<=x)*32; etc?


I thought this caveat made it clear:

> (In the binary search, you'd implement the if statements with conditional moves or predicated adds, depending on what your platform offers.)

It's usually a bad idea to use arithmetical trickery in an effort to get the compiler to generate conditional moves or predicated instructions. If you want to make sure the compiler does the right thing, use macros that are conditionally expanded to the appropriate intrinsics on each compiler/platform.

Since I didn't want to get into that, I used if-statements with the above caveat. Makes sense?


I admit I've overlooked the caveat while posting the response yet even with the caveat this does not make much sense, as every code can be turned into branchless when you use predicated instructions (a bit less so with conditional moves). This is what GPUs do often (many architectures don't have branch instructions so all branches are always executed with predication).


> I admit I've overlooked the caveat while posting the response yet even with the caveat this does not make much sense, as every code can be turned into branchless when you use predicated instructions (a bit less so with conditional moves).

Now you're just being intentionally obtuse. When I write if (x) y += z and say that it should be implemented with conditional moves or predicated adds per your platform, there really isn't much room for confusion.

> This is what GPUs do often (many architectures don't have branch instructions so all branches are always executed with predication).

Actually, all modern desktop-class GPU architectures have branch instructions. They only revert to predication (which in NVIDIA's case is implicit, i.e. not controlled by microcode-visible mask registers, though they of course also have CMOV-like instructions) when the different SIMD lanes take divergent branches. That's been the case since the G80/Tesla microarchitecture, which the GeForce 8800 from 2006 was the first product to use. Mostly-coherent dynamic branching is heavily used in modern shader code, to say nothing of GPGPU code, and is almost free. Incoherent branching is the big issue. Replacing that with incoherent memory accesses using branchless code can be a huge win, even though incoherent memory access is far from free.


>Now you're just being intentionally obtuse. When I write if (x) y += z and say that it should be implemented with conditional moves or predicated adds per your platform, there really isn't much room for confusion.

May be I am obtuse though not intentionally. If you wanted to say that if's can be replaced with predicated instructions why such a convoluted example (which is a nice piece of code, by the way)?

>Actually, all modern desktop-class GPU architectures have branch instructions.

Indeed, and as you say, they are not always executed because a single SIMD core executes threads in a lock-step and it's only possible to branch when all the threads yield the same condition. Besides there are still GPUs without branches, e.g. PS3 fragment shaders.


> If you wanted to say that if's can be replaced with predicated instructions why such a convoluted example (which is a nice piece of code, by the way)?

That's not what I wanted to say; it was a side note to a piece of code. Anyway, sorry if it was confusing. It was a spur-of-the-moment comment on Hacker News, not a carefully wrought essay.

> Besides there are still GPUs without branches, e.g. PS3 fragment shaders.

The RSX is ancient at this point, a turd in a box. It was a turd even at the time when, in the eleventh hour of the PS3 project, Sony begged NVIDIA to give them a GPU after their fantasy of Cell-based rasterization had proved ludicrous. It's so bad that you have to offload a ton of work on the SPUs to get acceptable graphics performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: