Quantization does reduce quality of the outputs. But the point is that you save ...

		int_19h on Feb 28, 2024 \| parent \| context \| favorite \| on: The Era of 1-bit LLMs: ternary parameters for cost... Quantization does reduce quality of the outputs. But the point is that you save enough memory doing so that you can cram a larger model into the same hardware, and this more than compensates for lost precision.