CS224N / 学习笔记
讲座 22
  • Stanford CS224N: NLP with Deep Learning (Winter 2026)
  • CS224N 学习计划
  • L01: Introduction and the History of NLP
  • L02: Word Vectors
  • L03: Backpropagation and Neural Networks
  • L04: Language Models and Recurrent Neural Networks
  • L05: Attention and Transformers
  • L06: Final Projects & Practical Tips
  • L07: Pretraining
  • L08: Post-training
  • L09: Efficient Adaptation (PEFT)
  • L10: RAG and Language Agents
  • L11: Evaluation
  • L12: Reasoning 1/2
  • L13: Reasoning 2/2
  • L14: Tokenization and Multilinguality
  • L15: Interpretability (Guest: Been Kim)
  • L16: AI's Impact on Humanity
  • L17: Multimodality (Guest: Luke Zettlemoyer)
  • L18: LoRA Without Regret (Guest: John Schulman)
  • L19: Open Questions in NLP
  • CS224N Final Project
作业 4
  • A1: Introduction to Word Vectors
  • A2: Neural Networks & Dependency Parsing
  • A3: Self-Attention & Transformers
  • A4: LLM Evaluation & Red-Teaming
概念 494
NAS与自动化设计 4
  • 神经架构搜索
  • DARTS
  • Once-for-All
  • slimmable
NLP基础 19
  • BLEU
  • BPE
  • Co-occurrence Matrix
  • Dependency Parsing
  • Distributional Semantics
  • GloVe
  • GPT
  • Language Model
  • Machine Translation
  • N-gram
  • Negative Sampling
  • NER
  • ROUGE
  • SentencePiece
  • Sentiment Analysis
  • Tokenization
  • Transition-Based Parsing
  • Word Embedding
  • Word2Vec
剪枝与稀疏化 52
  • 幅度剪枝
  • 剪枝压缩比
  • 结构化剪枝
  • 可学习门控
  • 网络剪枝
  • ART
  • Block Influence
  • CDPruner
  • DeepHoyer
  • depth pruning
  • DivPrune
  • DynamicViT
  • EfficientVLA
  • EViT
  • FastV
  • GOHSP
  • GraSP
  • IMP
  • LAMP
  • LayerDrop
  • ...还有 32 项
基础理论 60
  • 遍历动力系统
  • 残差能量
  • 测度论
  • 归一化第二矩
  • 核方法
  • 核函数
  • 计算复杂度
  • 模运算
  • 球面码
  • 全变差
  • 时间差分
  • 贪心算法
  • 无放回采样
  • 误差-性能折衷
  • 消融
  • 信息检索
  • 压缩顺序定理
  • 振荡模式
  • 整数线性规划
  • 正交投影
  • ...还有 40 项
待分类 7
  • A800
  • AI Safety
  • Jailbreak
  • NVIDIA H20
  • Red-Teaming
  • Sycophancy
  • Value Alignment
数据集与评估 35
  • BBH
  • Benchmarking
  • BoolQ
  • CIFAR-10
  • CIFAR-100
  • COCO
  • FineWeb-Edu
  • FLOPs
  • GenEval
  • GPQA
  • GSM8K
  • HellaSwag
  • HotpotQA
  • HumanEval
  • ImageNet
  • IoU
  • LIBERO
  • LiveCodeBench
  • LLM Evaluation
  • LLM-as-Judge
  • ...还有 15 项
模型增长 6
  • 训练动态
  • 灾难性遗忘
  • function-preserving
  • GradMax
  • Loss of Plasticity
  • progressive training
深度学习基础 57
  • 残差连接
  • 多数投票
  • 二分搜索
  • 焦点频率损失
  • 温度缩放
  • 信息熵
  • 形态学操作
  • 余弦相似度
  • 自回归解码
  • 自适应阈值
  • Activation Function
  • AdaLN
  • Backpropagation
  • Batch Normalization
  • Binary Cross-Entropy
  • Computation Graph
  • Cross-Attention
  • Cross-Entropy Loss
  • Cross-Modal Attention
  • DeepNorm
  • ...还有 37 项
知识蒸馏 15
  • 特征蒸馏
  • 知识蒸馏
  • 自蒸馏
  • DeFeat
  • DistiLLM
  • FGFI
  • FitNet
  • FreeKD
  • GID
  • GKD
  • Hinton KD
  • KL Divergence
  • MasKD
  • MiniLLM
  • self-distillation
网络架构 88
  • 残差连接
  • 循环神经网络
  • ALiBi
  • BeiT
  • BERT
  • BigBird
  • CaiT
  • classification head
  • CLIP
  • ConvGRU
  • ConvNext
  • CycleNet
  • Decision Transformer
  • DeepSeek
  • DeepViT
  • DeiT
  • DeltaNet
  • DenseFormer
  • DenseNet
  • DINOv2
  • ...还有 68 项
视觉任务 7
  • 类增量学习
  • 立体匹配
  • Diffusion Models
  • EDSR
  • GFL
  • SwinIR
  • Vision-Language Models
训练优化 39
  • AdamW
  • Alignment
  • BALD
  • BatchBALD
  • Constitutional AI
  • continual learning
  • Cosine Annealing
  • Cosine Decay
  • CosineAnnealingLR
  • Curriculum Learning
  • DDPG
  • DeepSpeed
  • DiLoCo
  • DPO
  • DreamBooth
  • EASY
  • EMA
  • EWC
  • FedAvg
  • Fine-tuning
  • ...还有 19 项
量化与低秩 52
  • 标量量化
  • 低秩分解
  • 混合精度
  • 量化分布
  • 向量量化
  • AdaLoRA
  • Adapter
  • AQLM
  • ASVD
  • AutoBit
  • AWQ
  • BitNet
  • BitNet b1.58
  • DFMC
  • DoRA
  • GPTQ
  • Hadamard rotation
  • HALO
  • HGQ
  • HQQ
  • ...还有 32 项
高效推理与部署 53
  • 动态路由
  • Acceptance Rate
  • ACT
  • adaptive computation
  • ADEPT
  • AnchorAttention
  • CALM
  • collaborative inference
  • Cumulative Acceptance Rate
  • DeeBERT
  • DSA
  • dynamic depth
  • EAGLE
  • early exit
  • edge AI
  • EE-LLM
  • Element-wise LUT
  • ELUT
  • FAST
  • Fast Graph Decoder
  • ...还有 33 项
论文 73
NLP基础 4
  • Distributed Representations of Words and Phrases and their Compositionality
  • Efficient Estimation of Word Representations in Vector Space
  • GloVe: Global Vectors for Word Representation
  • Improving Distributional Similarity with Lessons Learned from Word Embeddings
_待整理 1
  • A Novel Pulse-Agile Waveform Design Based on Random FM Waveforms for Range Sidelobe Suppression and Range Ambiguity Mitigation
剪枝与稀疏化 11
  • Adaptive MLP Pruning for Large Vision Transformers
  • Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks
  • Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
  • Deterministic Differentiable Structured Pruning for Large Language Models
  • Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
  • HiAP
  • IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
  • Rényi Entropy: A New Token Pruning Metric for Vision Transformers
  • ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
  • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
  • VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models
基础理论 8
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  • Demystifying When Pruning Works via Representation Hierarchies
  • Language Models are Few-Shot Learners
  • Learning Representations by Backpropagating Errors
  • Let's Verify Step by Step
  • On the difficulty of training Recurrent Neural Networks
  • Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Model Parameters
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models
数据集与评估 4
  • AlpacaEval: An Automatic Evaluator for Instruction-Following Language Models
  • Challenges and Opportunities in NLP Benchmarking
  • Holistic Evaluation of Language Models
  • Measuring Massive Multitask Language Understanding
模型增长 4
  • Anatomical Heterogeneity in Transformer Language Models
  • Grow, Assess, Compress: Adaptive Backbone Scaling for Memory-Efficient Class Incremental Learning
  • Grow, Don't Overwrite: Fine-tuning Without Forgetting
  • Growing Networks with Autonomous Pruning
网络架构 10
  • Attention Is All You Need
  • Attention Residuals
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Contextual Word Representations: A Contextual Introduction
  • Image Transformer
  • Language Models are Unsupervised Multitask Learners
  • Layer Normalization
  • The Illustrated BERT, ELMo, and co.
  • The Illustrated Transformer
  • The Llama 3 Herd of Models
视觉任务 3
  • Chameleon: Mixed-Modal Early-Fusion Foundation Models
  • Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
  • Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
训练优化 7
  • AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
  • DAPO: An Open-Source LLM Reinforcement Learning System at Scale
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
  • Scaling Instruction-Finetuned Language Models
  • Training language models to follow instructions with human feedback
量化与低秩 10
  • Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach
  • Big2Small: A Unifying Neural Network Framework for Model Compression
  • BinaryAttention
  • Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
  • LLVQ
  • LoRA: Low-Rank Adaptation of Large Language Models
  • Parameter-Efficient Transfer Learning for NLP
  • Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
  • RAMP: Reinforcement Adaptive Mixed-Precision Quantization for Efficient On-Device LLM Inference
  • SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
高效推理与部署 11
  • Fast Inference from Transformers via Speculative Decoding
  • Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
  • FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
  • Language Agents: Foundations, Prospects, and Risks
  • MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
  • PagedAttention
  • ReAct: Synergizing Reasoning and Acting in Language Models
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  • Self-Distillation for Multi-Token Prediction
  • TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference
  • Toolformer: Language Models Can Teach Themselves to Use Tools
拓展阅读 48
推理与评估 17
  • Chain-of-Thought 的概率论视角
  • DeepSeek-R1 训练流程与 RL 方法对比
  • Scaling Laws 与 Chinchilla 最优
  • Agent 评估的奖励设计
  • RAG 与 Agent 系统的形式化推导
  • Goodhart 定律的形式化
  • NLP 评估指标与协议
  • Speculative Decoding 算法与加速分析
  • Off-policy 问题与 RoPE Position Scaling
  • Best-of-N 采样分析
  • 线性表示假说与 CAV
  • 校准模型必须幻觉:信息论证明
  • 算法单一文化(Algorithmic Monoculture)形式化
  • DAPO 非对称裁剪比率的数学机制
  • G-Vendi Score:梯度度量数据多样性
  • RLP Information Gain Reward 推导
  • GRPO 目标函数与 Pass@K 的关系
概率模型 3
  • HMM 完整推导:前向算法、维特比解码与 Baum-Welch EM
  • PCFG 与 CYK 算法:句法分析的概率化
  • N-gram 语言模型与平滑技术
注意力与Transformer 4
  • Self-Attention 完整推导
  • Multi-Head Attention 与 Transformer 核心组件
  • Transformer 计算复杂度分析
  • 三种 Transformer 架构的注意力矩阵对比
神经网络基础 6
  • 反向传播与神经网络前向传播完整推导
  • 矩阵微积分:Jacobian 与链式法则
  • 激活函数导数完整推导
  • LSTM 完整推导与梯度消失分析
  • 语言模型的概率基础
  • 梯度连乘与条件语言模型推导
词向量与表示学习 4
  • Word2Vec Skip-gram 目标函数与梯度推导
  • 负采样(Negative Sampling)理论与推导
  • GloVe 目标函数推导
  • 词类比公式与窗口分类推导
预训练与微调 14
  • BPE 算法完整步骤
  • 探针任务(Probing)
  • 预训练目标函数与架构对比
  • RLHF 完整数学推导
  • DPO 与 GRPO 完整推导
  • SimPO 与 DPO 对比推导
  • 结构化剪枝的一般框架
  • LoRA、Adapter 与 Prompt Tuning 推导
  • 字节级(Byte-Level)模型分析
  • BPE 算法完整伪代码
  • Tokenization 理论与多语言分析
  • Constitutional AI 两阶段算法
  • Transfusion 的混合损失函数
  • LoRA 的完整数学结构(Guest Lecture)

批注

选中文字即可添加高亮

#指令微调

共 2 个条目

论文 (2)

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources Scaling Instruction-Finetuned Language Models

CS224N: NLP with Deep Learning · Stanford Winter 2026 · 个人学习笔记