#GRPO 共 6 个条目 讲座 (1) L12: Reasoning 1/2 论文 (2) DAPO: An Open-Source LLM Reinforcement Learning System at Scale DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 拓展阅读 (3) DeepSeek-R1 训练流程与 RL 方法对比 DPO 与 GRPO 完整推导 GRPO 目标函数与 Pass@K 的关系