论文索引

核心论文 · 按分类浏览 · 关联讲座 · 共 73 篇论文

_待整理

1 篇

高效推理与部署

11 篇
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, Yossi Matias (2023) · L13
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Bowen Wen, Shaurya Dewan, Stan Birchfield (2025)
FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
Wilhelm Tranheden, Shahnawaz Ahmed, Devdatt Dubhashi, Jonna Matthiesen, Hannes von Essen (2026)
Language Agents: Foundations, Prospects, and Risks
Yu Wang, Zhihan Zhang, Jiayu Zhou (2024) · L10
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Yu Chen, Runkai Chen, Sheng Yi, Xinda Zhao, Xiaohong Li, Jianjin Zhang, Jun Sun, Chuanrui Hu, Yunyun Han, Lidong Bing, Yafeng Deng, Tianqiao Chen (2025)
PagedAttention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica (2023)
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao (2023) · L10
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela (2020) · L10
Self-Distillation for Multi-Token Prediction
Guoliang Zhao, Ruobing Xie, An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun (2026)
TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference
Jaber Jaber, Osama Jaber (2026)
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom (2023) · L10

基础理论

8 篇

剪枝与稀疏化

11 篇
Adaptive MLP Pruning for Large Vision Transformers
Chengchao Shen (2026)
Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks
Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski (2026)
Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
Remigiusz Kinas, Paweł Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej (2026)
Deterministic Differentiable Structured Pruning for Large Language Models
Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen (2026)
Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
Jimyung Hong, Jaehyung Kim (2026)
HiAP
IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
Dong-Jae Lee, Sunghyun Baek, Junmo Kim (2026)
Rényi Entropy: A New Token Pruning Metric for Vision Transformers
Wei-Yuan Su, Ruijie Zhang, Zheng Zhang (2026)
ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
Xu Li, Yi Zheng, Yuxuan Liang, Zhe Liu, Xiaolei Chen, Haotian Chen, Rui Zhu, Xiangyang Xue (2026)
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle, Michael Carlin (2019) · L9
VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models
Jintao Cheng, Haozhe Wang, Weibin Li, Gang Wang, Yipu Zhang, Xiaoyu Tang, Jin Wu, Xieyuanli Chen, Yunhui Liu, Wei Zhang (2026)

量化与低秩

10 篇

模型增长

4 篇

视觉任务

3 篇

数据集与评估

4 篇

网络架构

10 篇

训练优化

7 篇

NLP基础

4 篇