quantization

Everything on Ground Truth tagged “quantization” — 3 items.

A model that rivals the frontier now squeezes onto a single high-end desktop News

Aggressive compression shrinks GLM 5.2 by more than 80 percent while keeping most of its accuracy, putting a near-frontier model within reach of local hardware.

Quantization: Shrinking AI Models to Run on Modest Hardware Lesson

Storing a model's numbers with less precision - 8, 4, or even fewer bits instead of 16 - makes it dramatically smaller and faster, often with almost no loss in quality. It's why big models can run on a laptop or a single GPU.

Unsloth Tool

Toolkit and documentation for running and fine-tuning large open models faster and on smaller hardware, including aggressive dynamic quantization recipes that shrink models like GLM 5.2 by 80-plus percent while keeping most of their accuracy. The practical on-ramp to running near-frontier models privately.