#推理 共 4 个条目 论文 (1) Self-Consistency Improves Chain of Thought Reasoning in Language Models 拓展阅读 (3) Chain-of-Thought 的概率论视角 GRPO 目标函数与 Pass@K 的关系 RLP Information Gain Reward 推导