内卷地狱

Embedding Models

Edit Me

Embedding is a technique that maps discrete objects (such as words, sentences, images, and user behaviors) to a continuous vector space.
With this representation, semantically similar objects tend to be closer together in the vector space, making computation and modeling more tractable.

Core Idea

  • Discrete → Continuous: Transforms symbolic inputs into numerical vectors, enabling neural network processing.
  • Semantic preservation: The structure of the vector space retains the semantic relationships between objects.
  • Computability: Vectors support operations such as addition, dot product, and cosine similarity, enabling retrieval, clustering, and classification.

Applications in Large Models

  • Word/sentence vectors: The most common representation in NLP models (e.g., Word2Vec, BERT, GPT).
  • Multimodal representations: Mapping images, audio, video, and other modalities into a shared vector space for cross-modal retrieval.
  • Retrieval and recommendation: Semantic retrieval based on vector similarity (vector databases, RAG), and personalized recommender systems.
  • Fine-tuning and merging: Optimizing vector representations for specific tasks via methods such as LoRA and SLERP.

Typical Methods

  • Early methods: Word2Vec, GloVe
  • Contextual representations: ELMo, BERT
  • Embeddings from generative LLMs: GPT series, Qwen Embedding, OpenAI Embedding API

Summary

Embedding is a foundational component of modern machine learning and large model applications.
It bridges the discrete and continuous worlds, and is a core tool for semantic understanding, retrieval-augmented generation (RAG), and multimodal fusion.


贡献者


这篇文章有帮助吗?

最近更新

Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0CCBYNCSA