Transformer

Core Papers and Code

Key Concepts

QKV computation in Self-Attention
The role of Scaled Dot-Product
Principles of Multi-Head Attention
Tokenization and Tokenizer
Word Embedding
Positional Encoding
Attention Mechanism
Feed Forward Network
Masking
Layer Normalization
Decoding Techniques

Deep Dive

Transformer paper paragraph-by-paragraph reading [Paper Reading]

Attention Mechanism Learning Resources

[HD bilingual subtitles] Andrew Ng explains Transformer working principles in detail (2025)
- Original course link (DeepLearning.AI Short Courses)
Mastering the Attention Mechanism thoroughly

贡献者

这篇文章有帮助吗？

最近更新

Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0

On this page

Core Papers and Code Key Concepts Deep Dive Attention Mechanism Learning Resources