#LLM量化 共 2 个条目 论文 (2) Bitnet.cpp: Efficient Edge Inference for Ternary LLMs RAMP: Reinforcement Adaptive Mixed-Precision Quantization for Efficient On-Device LLM Inference