A3: Self-Attention & Transformers

A3: Self-Attention & Transformers

作业内容

Part 1: Written — Attention Mechanism Analysis

  • Multi-head self-attention 的数学推导
  • 注意力分数的几何解释
  • Positional encoding 的必要性分析

Part 2: Coding — Transformer Implementation

  • 从零实现 Transformer encoder
  • Scaled dot-product attention
  • Multi-head attention layer
  • Position-wise feed-forward network
  • Layer normalization 与 residual connections
  • 在序列任务上训练与评估

相关讲座

关联概念

完成记录

  • date-started::
  • date-completed::
  • difficulty::
  • notes::