PTQ

分类: 量化与低秩

PTQ (Post-Training Quantization)

在模型训练完成后，不需要重新训练即可将模型权重和/或激活从高精度（FP32/FP16）量化到低精度（INT8/INT4）的技术。

无需访问完整训练数据，通常只需少量校准数据（calibration data）

主要挑战：量化误差累积、outlier activation 导致精度严重下降

常见策略：per-channel scaling、rotation/transformation、low-rank error reconstruction

与 QAT 相比，PTQ 部署成本更低但精度通常略差

GPTQ: 基于 Hessian 的逐层权重量化

SmoothQuant: 将量化难度从 activation 迁移到 weight

AWQ: 基于 activation-aware 的权重量化