Isn't the real performance gained by eliminating the triple nested for loops and memoizing over a sliding window, rather than the use of XOR? You would get the same order-of-magnitude improvement if you used a map combined with iterate-only-once approach.
Ie, it's undeniable that the final program works much faster, but the article should really be called "a smarter memoization trick."
Ie, it's undeniable that the final program works much faster, but the article should really be called "a smarter memoization trick."