Quantization is a technique to:
- Compress large models by reducing precision (e.g., from float32 → int8).
- Make them run faster, use less memory, and even run on CPU or mobile.
Tools like llama.cpp, ggml, and mlc-llm quantize models to make them run on M1 chips, Raspberry Pi, or Android.