A3: Self-Attention & Transformers
A3: Self-Attention & Transformers
作业内容
Part 1: Written — Attention Mechanism Analysis
- Multi-head self-attention 的数学推导
- 注意力分数的几何解释
- Positional encoding 的必要性分析
Part 2: Coding — Transformer Implementation
- 从零实现 Transformer encoder
- Scaled dot-product attention
- Multi-head attention layer
- Position-wise feed-forward network
- Layer normalization 与 residual connections
- 在序列任务上训练与评估
相关讲座
关联概念
- Self-Attention, Transformer
- Positional Encoding, Layer Normalization
- Multi-Head Attention
完成记录
- date-started::
- date-completed::
- difficulty::
- notes::