论文索引
核心论文 · 按分类浏览 · 关联讲座 · 共 73 篇论文
_待整理
1 篇高效推理与部署
11 篇 Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, Yossi Matias
(2023)
· L13
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Bowen Wen, Shaurya Dewan, Stan Birchfield
(2025)
FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
Wilhelm Tranheden, Shahnawaz Ahmed, Devdatt Dubhashi, Jonna Matthiesen, Hannes von Essen
(2026)
Language Agents: Foundations, Prospects, and Risks
Yu Wang, Zhihan Zhang, Jiayu Zhou
(2024)
· L10
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Yu Chen, Runkai Chen, Sheng Yi, Xinda Zhao, Xiaohong Li, Jianjin Zhang, Jun Sun, Chuanrui Hu, Yunyun Han, Lidong Bing, Yafeng Deng, Tianqiao Chen
(2025)
PagedAttention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
(2023)
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
(2023)
· L10
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
(2020)
· L10
Self-Distillation for Multi-Token Prediction
Guoliang Zhao, Ruobing Xie, An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun
(2026)
TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference
Jaber Jaber, Osama Jaber
(2026)
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
(2023)
· L10
基础理论
8 篇 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
(2022)
· L9 L12
Demystifying When Pruning Works via Representation Hierarchies
Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li
(2026)
Language Models are Few-Shot Learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al.
(2020)
· L9
Learning Representations by Backpropagating Errors
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
(1986)
· L3
Let's Verify Step by Step
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
(2023)
· L13
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
(2013)
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
(2024)
· L13
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
(2023)
· L12 L13
剪枝与稀疏化
11 篇 Adaptive MLP Pruning for Large Vision Transformers
Chengchao Shen
(2026)
Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks
Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski
(2026)
Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
Remigiusz Kinas, Paweł Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej
(2026)
Deterministic Differentiable Structured Pruning for Large Language Models
Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen
(2026)
Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
Jimyung Hong, Jaehyung Kim
(2026)
HiAP
IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
Dong-Jae Lee, Sunghyun Baek, Junmo Kim
(2026)
Rényi Entropy: A New Token Pruning Metric for Vision Transformers
Wei-Yuan Su, Ruijie Zhang, Zheng Zhang
(2026)
ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
Xu Li, Yi Zheng, Yuxuan Liang, Zhe Liu, Xiaolei Chen, Haotian Chen, Rui Zhu, Xiangyang Xue
(2026)
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle, Michael Carlin
(2019)
· L9
VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models
Jintao Cheng, Haozhe Wang, Weibin Li, Gang Wang, Yipu Zhang, Xiaoyu Tang, Jin Wu, Xieyuanli Chen, Yunhui Liu, Wei Zhang
(2026)
量化与低秩
10 篇 Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach
Hengyuan Zhang, Xinrong Chen, Zunhai Su, Xiao Liang, Jing Xiong, Wendong Xu, He Xiao, Chaofan Tao, Wei Zhang, Ruobing Xie, Lei Jiang, Hayden Kwok-Hay So, Ngai Wong
(2025)
Big2Small: A Unifying Neural Network Framework for Model Compression
Jing-Xiao Liao, Haoran Wang, Tao Li, Daoming Lyu, Yi Zhang, Chengjun Cai, Feng-Lei Fan
(2026)
BinaryAttention
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei
(2025)
LLVQ
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
(2021)
· L9
Parameter-Efficient Transfer Learning for NLP
Neil Houlsby, Andrei Giber, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly
(2019)
· L9
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
Minjun Kim, Jaehyeon Choi, Hyunwoo Yang, Jongjin Kim, Jinho Song, U Kang
(2025)
RAMP: Reinforcement Adaptive Mixed-Precision Quantization for Efficient On-Device LLM Inference
Arpit Singh Gautam, Saurabh Jha
(2026)
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
Yeonsik Park, Hyeonseong Kim, Seungkyu Choi
(2026)
模型增长
4 篇 Anatomical Heterogeneity in Transformer Language Models
Tomasz Wietrzykowski
(2026)
Grow, Assess, Compress: Adaptive Backbone Scaling for Memory-Efficient Class Incremental Learning
Adrian Garcia-Castañeda, Jon Irureta, Jon Imaz, Aizea Lojo
(2026)
Grow, Don't Overwrite: Fine-tuning Without Forgetting
Dyah Adila
(2026)
Growing Networks with Autonomous Pruning
Charles de Lambilly, Stefan Duffner
(2026)
视觉任务
3 篇 Chameleon: Mixed-Modal Early-Fusion Foundation Models
Meta AI
(2024)
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
(2024)
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
(2024)
数据集与评估
4 篇 AlpacaEval: An Automatic Evaluator for Instruction-Following Language Models
Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto
(2023)
· L11
Challenges and Opportunities in NLP Benchmarking
(Multiple authors)
(2024)
· L11
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, et al.
(2022)
· L11
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt
(2021)
· L11
网络架构
10 篇 Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
(2017)
· L5 L6
Attention Residuals
Kimi Team (Guangyu Chen, Yu Zhang, Jianlin Su, et al.)
(2026)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
(2019)
· L7 L11 L14
Contextual Word Representations: A Contextual Introduction
Noah A. Smith
(2019)
· L7
Image Transformer
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
(2018)
Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
(2019)
· L7
Layer Normalization
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
(2016)
The Illustrated BERT, ELMo, and co.
Jay Alammar
(2018)
· L7
The Illustrated Transformer
Jay Alammar
(2018)
The Llama 3 Herd of Models
Meta AI
(2024)
· L7
训练优化
7 篇 AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto
(2023)
· L8
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
ByteDance Research
(2025)
· L8 L12 L19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
(2025)
· L8 L12
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
(2023)
· L8 L9 L12
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelvin Luu, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
(2023)
· L8
Scaling Instruction-Finetuned Language Models
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al.
(2022)
· L8
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al.
(2022)
· L8
NLP基础
4 篇 Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
(2013)
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
(2013)
· L1 L2
GloVe: Global Vectors for Word Representation
Jeffrey Pennington, Richard Socher, Christopher D. Manning
(2014)
· L1 L2
Improving Distributional Similarity with Lessons Learned from Word Embeddings
Omer Levy, Yoav Goldberg, Ido Dagan
(2015)