Does quantization need to be an all or nothing? with the kind of low bit models ...

kromem · on Feb 28, 2024

Given the weights are just mapping to a virtual network structure anyways, my guess would be that as parameter sizes increase any difference node precision might have will evaporate when trained from the ground up.

So moving to extremely high efficiency native ternary hardware like with optics is going to be a much better result than trying to mix precision in classical hardware.

We'll see, but this is one of those things that I wouldn't have expected to be true but as soon as I see that it is it kind of makes sense. If it holds up (and it probably will) it's going to kick off a hardware revolution in AI.