内卷地狱

Transformer

Edit Me

Core Papers and Code

Key Concepts

  • QKV computation in Self-Attention
  • The role of Scaled Dot-Product
  • Principles of Multi-Head Attention
  • Tokenization and Tokenizer
  • Word Embedding
  • Positional Encoding
  • Attention Mechanism
  • Feed Forward Network
  • Masking
  • Layer Normalization
  • Decoding Techniques

Deep Dive

  • Transformer paper paragraph-by-paragraph reading [Paper Reading]

Attention Mechanism Learning Resources


贡献者


这篇文章有帮助吗?

最近更新

Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0CCBYNCSA