2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU

Author:

Leave a Comment

You must be logged in to post a comment.