Transformer
Core Papers and Code
Key Concepts
- QKV computation in Self-Attention
- The role of Scaled Dot-Product
- Principles of Multi-Head Attention
- Tokenization and Tokenizer
- Word Embedding
- Positional Encoding
- Attention Mechanism
- Feed Forward Network
- Masking
- Layer Normalization
- Decoding Techniques
Deep Dive
- Transformer paper paragraph-by-paragraph reading [Paper Reading]
Attention Mechanism Learning Resources
- [HD bilingual subtitles] Andrew Ng explains Transformer working principles in detail (2025)
- Mastering the Attention Mechanism thoroughly
贡献者
这篇文章有帮助吗?
最近更新
Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0