#quantization 共 2 个条目 论文 (2) Big2Small: A Unifying Neural Network Framework for Model Compression FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference