#LLM-deployment 共 1 个条目 论文 (1) FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference