Word Embedding

分类: NLP基础

Word Embedding

定义

词嵌入是将离散的词汇符号映射到连续低维向量空间的表示方法，使语义相近的词在向量空间中距离也近，是现代 NLP 的基础组件

数学形式

$e_w = E \cdot x_w$

其中 $E \in \mathbb{R}^{d \times |V|}$ 为嵌入矩阵， $x_w$ 为 one-hot 向量， $d$ 为嵌入维度

Word2Vec Skip-gram 目标函数： $J(\theta) = -\frac{1}{T}\sum_{t=1}^{T}\sum_{-c \leq j \leq c, j \neq 0} \log P(w_{t+j} | w_t)$ $P(o|c) = \frac{\exp(u_o^T v_c)}{\sum_{w \in V}\exp(u_w^T v_c)}$

核心要点

分布式假设（distributional hypothesis）：词的语义由其上下文决定——“You shall know a word by the company it keeps”（Firth, 1957）

Word2Vec（Mikolov et al., 2013）：两种架构——Skip-gram（给定中心词预测上下文）和 CBOW（给定上下文预测中心词），通过负采样加速训练

GloVe（Pennington et al., 2014）：基于全局共现矩阵分解，结合了全局统计和局部窗口的优点

FastText（Bojanowski et al., 2017）：引入子词（subword）n-gram，能处理 OOV 词

词嵌入的经典性质： $\vec{king} - \vec{man} + \vec{woman} \approx \vec{queen}$ （类比推理）

CS224N Lecture 1-2 的核心主题，是理解后续所有 NLP 模型的起点

局限性：静态嵌入无法处理一词多义（polysemy），后被 ELMo/BERT 等上下文嵌入取代

代表工作

Word2Vec: Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013)

GloVe: Global Vectors for Word Representation (Pennington et al., 2014)

FastText: Enriching Word Vectors with Subword Information (Bojanowski et al., 2017)

Word Embedding

Word Embedding

定义

数学形式

核心要点

代表工作

相关概念