SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Resource type
Conference Paper
Authors/contributors
Title
SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
Abstract
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
Date
2023-07-03
Proceedings Title
Proceedings of the 40th International Conference on Machine Learning
Conference Name
International Conference on Machine Learning
Publisher
PMLR
Pages
10323-10337
Language
en
Short Title
SparseGPT
Accessed
24/02/2024, 17:43
Library Catalogue
proceedings.mlr.press
Extra
ISSN: 2640-3498
Citation
Frantar, E., & Alistarh, D. (2023). SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot. Proceedings of the 40th International Conference on Machine Learning, 10323–10337. https://proceedings.mlr.press/v202/frantar23a.html