#高效推理 共 3 个条目 论文 (3) Bitnet.cpp: Efficient Edge Inference for Ternary LLMs RAMP: Reinforcement Adaptive Mixed-Precision Quantization for Efficient On-Device LLM Inference Fast Inference from Transformers via Speculative Decoding