A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor
import subprocess, sys def pip(*pkgs): subprocess.check_call() pip("llmcompressor", "compressed-tensors", "transformers>=4.45", "accelerate", "datasets") import os, gc, time, json, math from pathlib import...

