2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy
Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU
Continue reading on Towards Data Science »
Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU
Continue reading on Towards Data Science »