This is wrong. The paper described that their activation is in int8 during inference.
That being said, before-LLM-era deep learning already had low bit quantization down to 1w2f [0] working back in 2016 [1]. So it's certainly possible it would work for LLM too.
That being said, before-LLM-era deep learning already had low bit quantization down to 1w2f [0] working back in 2016 [1]. So it's certainly possible it would work for LLM too.
[0] 1-bit weights, 2-bit activations; though practically people deployed 2w4f instead. [1] https://arxiv.org/abs/1606.06160