Mistral-7B

分类: 网络架构

定义

Mistral AI 发布的 7B 参数开源语言模型，采用 Grouped-Query Attention 和 Sliding Window Attention

32 层 Transformer decoder，hidden dim 4096

使用 Grouped-Query Attention (GQA) 提升推理效率

Sliding Window Attention 支持长序列处理

Instruct 版本经过指令微调，广泛用于 benchmark 评估

E5-Mistral 变体专门用于文本检索任务

Pruning-on-Representations: 作为主要实验模型，分析剪枝在不同任务上的差异表现