CS224N / 学习笔记

#CS224N-assignment #self-attention #transformer

A3: Self-Attention & Transformers

A3: Self-Attention & Transformers

作业内容

Part 1: Written — Attention Mechanism Analysis

Multi-head self-attention 的数学推导
注意力分数的几何解释
Positional encoding 的必要性分析

Part 2: Coding — Transformer Implementation

从零实现 Transformer encoder
Scaled dot-product attention
Multi-head attention layer
Position-wise feed-forward network
Layer normalization 与 residual connections
在序列任务上训练与评估

相关讲座

L05 Transformers, L07 Pretraining

关联概念

Self-Attention, Transformer
Positional Encoding, Layer Normalization
Multi-Head Attention

完成记录

date-started::
date-completed::
difficulty::
notes::