Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Map of Meaning: How Embedding Models “Understand” Human Language
    AI Tools

    The Map of Meaning: How Embedding Models “Understand” Human Language

    AwaisBy AwaisMarch 31, 2026No Comments13 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Map of Meaning: How Embedding Models “Understand” Human Language
    Share
    Facebook Twitter LinkedIn Pinterest Email

    you work with Artificial Intelligence development, if you are studying, or planning to work with that technology, you certainly stumbled upon embedding models along your journey.

    At its heart, an embedding model is a neural network trained to map like words or sentences into a continuous vector space, with the goal of approximating mathematically those objects that are contextually or conceptually similar.

    Putting it in simpler words, imagine a library where the books are not categorized only by author and title, but by many other dimensions, such as vibe, topic, mood, writing style, etc.

    Another good analogy is a map itself. Think of a map and two cities you don’t know. Let’s say you are not that good with Geography and don’t know where Tokyo and New York City are in the map. If I tell you that we should have breakfast in NYC and lunch in Tokyo, you could say: “Let’s do it”.

    However, once I give you the coordinates for you to check the cities on the map, you will see they are very far away from each other. That is like giving the embeddings to a model: they are the coordinates!

    Building the Map

    Even before you ever ask a question, the embedding model was trained. It has read millions of sentences and noted patterns. For example, it sees that “cat” and “kitten” often appear in the same kinds of sentences, while “cat” and “refrigerator” rarely do.

    With those patterns, the model assigns every word a set of coordinates on a mathematical space, like an invisible map.

    • Concepts that are similar (like “cat” and “kitten”) get placed right next to each other on the map.
    • Concepts that are somewhat related (like “cat” and “dog”) are placed near each other, but not right on top of one another.
    • Concepts that are totally unrelated (like “cat” and “quantum physics”) are placed in completely different corners of the map, like NYC and Tokyo.

    The Digital Fingerprint

    Nice. Now we know how the map was created. What comes next?

    Now we will work with this trained embedding model. Once we give the model a sentence like “The fluffy kitten is sleeping”:

    1. It doesn’t look at the letters. Instead, it visits those coordinates on its map for each word.
    2. It calculates the center point (the average) of all those locations. That single center point becomes the “fingerprint” for the whole sentence.
    3. It puts a pin on the map where your question’s fingerprint is
    4. Looks around in a circle to see which other fingerprints are nearby.

    Any documents that “live” near your question on this map are considered a match, because they share the same “vibe” or topic, even if they don’t share the exact same words.

    Embeddings: the invisible map. | Image generated by AI. Google Gemini, 2026.

    It’s like searching for a book not by searching for a specific keyword, but by pointing to a spot on a map that says “these are all books about kittens,” and letting the model fetch everything in that neighborhood.

    Embedding Models Steps

    Let’s see next how an embedding model works step-by-step after getting a request.

    1. Computer takes in a text.
    2. Breaks it down into tokens, which is the smallest piece of a phrase with meaning. Usually, that’s a word or a part of the word.
    3. Chunking: The input text is split into manageable chunks (often around 512 tokens), so it doesn’t get overwhelmed by too much information at once.
    4. Embedding: It transforms each snippet into a long list of numbers (a vector) that acts like a unique fingerprint representing the meaning of that text.
    5. Vector Search: When you ask a question, the model turns your question into a “fingerprint” too and quickly calculates which stored snippets have the most mathematically similar numbers.
    6. Model returns the most similar vectors, which are associated with text chunks.
    7. Generation: If you are performing a Retrieval-Augmented Generation (RAG), the model hands those few “winning” snippets to an AI (like a LLM) which reads them and writes out a natural-sounding answer based only on that specific information.

    Coding

    Great. We did a lot of talking. Now, let’s try to code a little and get those concepts more practical.

    We will start with a simple BERT (Bidirectional Encoder Representations from Transformers) embedding. It was created by Google and uses the Transformer architecture and its attention mechanism. The vector for a word changes based on the words surrounding it.

    # Imports
    from transformers import BertTokenizer
    
    # Load pre-trained BERT tokenizer
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
    
    # Sample text for tokenization
    text = "Embedding models are so cool!"
    
    # Step 1: Tokenize the text
    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    # View
    tokens
    {'input_ids': tensor([[ 101, 7861, 8270, 4667, 4275, 2024, 2061, 4658,  999,  102]]),
     'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 
     'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

    Notice how each word was transformed into an id. Since we have only 5 words, some of them might have been broken down into two subwords.

    • The ID 101 is associated with the token [CLS]. That token’s vector is thought to capture the overall meaning or information of the entire sentence or sequence of sentences. It is like a stamp that indicates to the LLMs the meaning of that chunk. [2]
    • The ID 102 is associated with the token [SEP] to separate sentences. [2]

    Next, let’s apply the embedding model to data.

    Embedding

    Here is another simple snippet where we get some text and encode it with the versatille and all-purpose embedding model all-MiniLM-L6-v2.

    from qdrant_client import QdrantClient, models
    from sentence_transformers import SentenceTransformer
    
    # 1. Load embedding model
    model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')
    
    # 2. Initialize Qdrant client
    client = QdrantClient(":memory:")
    
    # 3. Create embeddings
    docs = ["refund policy", "pricing details", "account cancellation"]
    vectors = model.encode(docs).tolist()
    
    # 4. Store Vectors: Create a collection (DB)
    client.create_collection(
        collection_name="my_collection",
        vectors_config = models.VectorParams(size=384,
                                             distance= models.Distance.COSINE)
    )
    
    # Upload embedded docs (vectors)
    client.upload_collection(collection_name="my_collection",
                             vectors= vectors,
                             payload= [{"source": docs[i]} for i in range(len(docs))])
    
    
    
    
    # 5. Search
    query_vector = model.encode("How do I cancel my subscription")
    
    # Result
    result = client.query_points(collection_name= 'my_collection',
                                 query= query_vector,
                                 limit=2,
                                 with_payload=True)
    
    print("\n\n ======= RESULTS =========")
    result.points
    

    The results are as expected. It points to the account cancellation topic!

     ======= RESULTS =========
    [ScoredPoint(id='b9f4aa86-4817-4f85-b26f-0149306f24eb', version=0, score=0.6616353073200185, payload={'source': 'account cancellation'}, vector=None, shard_key=None, order_value=None),
     ScoredPoint(id='190eaac1-b890-427b-bb4d-17d46eaffb25', version=0, score=0.2760082702501182, payload={'source': 'refund policy'}, vector=None, shard_key=None, order_value=None)]

    What just happened?

    1. We imported a pre-trained embedding model
    2. Instantiated a vector database of our choice: Qdrant [3].
    3. Embedded the text and uploaded it to the vector DB in a new collection.
    4. We submitted a query.
    5. The results are those documents with the closest mathematical “fingerprint”, or meaning to the query’s embeddings.

    This is really nice.

    To end this article, I wonder if we can try to fine tune an embedding model. Let’s try.

    Fine Tuning an Embedding Model

    Fine-tuning an embedding model is different from fine-tuning an LLM. Instead of teaching the model to “talk,” you are teaching it to reorganize its internal map so that specific concepts in your domain are pushed further apart or pulled closer together.

    The most common and effective way to do this is using Contrastive Learning with a library like Sentence-Transformers.

    First, teach the model what closeness looks like using three data points.

    • Anchor: The reference item (e.g., “Brand A Cola Soda”)
    • Positive: A similar item (e.g., “Brand B Cola Soda”) that model should pull together.
    • Negative: A different item (e.g., “Brand A Cola Soda Zero Sugar”) that the model should push away.

    Next, we choose a Loss Function to tell the model how much to change when it makes a mistake. You can choose between:

    • MultipleNegativesRankingLoss: Great if you only have (Anchor, Positive) pairs. It assumes every other positive in the batch is a “negative” for the current anchor.
    • TripletLoss: Best if you have explicit (Anchor, Positive, Negative) sets. It forces the distance between Anchor-Positive to be smaller than Anchor-Negative by a specific margin.

    This is the model similarity results out-of-the-box.

    from sentence_transformers import SentenceTransformer, InputExample, losses
    from torch.utils.data import DataLoader
    from sentence_transformers import util
    
    # 1. Load a pre-trained base model
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    # 1. Define your test cases
    query = "Brand A Cola Soda"
    choices = [
        "Brand B Cola Soda",   # The 'Positive' (Should be closer now)
        "Brand A Cola Soda Zero Sugar"   # The 'Negative' (Should be further away now)
    ]
    
    # 2. Encode the text into vectors
    query_vec = model.encode(query)
    choice_vecs = model.encode(choices)
    
    # 3. Compute Cosine Similarity
    # util.cos_sim returns a matrix, so we convert to a list for readability
    cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
    
    print(f"\n\n ======= Results for: {query} ===============")
    for i, score in enumerate(cos_scores):
        print(f"-> {choices[i]}: {score:.5f}")
     ======= Results for: Brand A Cola Soda ===============
    -> Brand B Cola Soda: 0.86003
    -> Brand A Cola Soda Zero Sugar: 0.81907

    And when we try to fine tune it, showing this model that the Cola Sodas should be closer than the Zero Sugar version, this is what happens.

    from sentence_transformers import SentenceTransformer, InputExample, losses
    from torch.utils.data import DataLoader
    from sentence_transformers import util
    
    # 1. Load a pre-trained base model
    fine_tuned_model = SentenceTransformer('all-MiniLM-L6-v2')
    
    # 2. Define your training data (Anchors, Positives, and Negatives)
    train_examples = [
        InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand C Cola Zero Sugar"]),
        InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand A Cola Zero Sugar"]),
        InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand B Cola Zero Sugar"])
    ]
    
    # 3. Create a DataLoader and choose a Loss Function
    train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
    train_loss = losses.TripletLoss(model=fine_tuned_model)
    
    # 4. Tune the model
    fine_tuned_model.fit(train_objectives=[(train_dataloader, train_loss)], 
                         optimizer_params={'lr': 9e-5},
                         epochs=40)
    
    
    # 1. Define your test cases
    query = "Brand A Cola Soda"
    choices = [
        "Brand B Cola Soda",   # The 'Positive' (Should be closer now)
        "Brand A Cola Zero Sugar"   # The 'Negative' (Should be further away now)
    ]
    
    # 2. Encode the text into vectors
    query_vec = fine_tuned_model.encode(query)
    choice_vecs = fine_tuned_model.encode(choices)
    
    # 3. Compute Cosine Similarity
    # util.cos_sim returns a matrix, so we convert to a list for readability
    cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
    
    print(f"\n\n ======== Results for: {query} ====================")
    for i, score in enumerate(cos_scores):
        print(f"-> {choices[i]}: {score:.5f}")
     ======== Results for: Brand A Cola Soda ====================
    -> Brand B Cola Soda: 0.86247
    -> Brand A Cola Zero Sugar: 0.75732

    Here, we didn’t get a much better result. This model is trained over a very large amount of data, so this fine tuning with a small example was not enough to make it work the way we expected.

    But still, this is a great learning. We were able to make the model iapproximate both Cola Soda examples, but that also brought closer the Zero Cola Soda.

    Alignment and Uniformity

    A good way of checking how the model was updated is looking at these metrics

    • Alignment: Imagine you have a bunch of related items, like ‘Brand A Cola Soda’ and ‘Cola Soda’. Alignment measures how close these related items are to each other in the embedding space.
      • A high alignment score means that your model is good at placing similar things close together, which is generally what you want for tasks like searching for similar products.
    • Uniformity: Now imagine all your different items, from ‘refund policy’ to ‘Quantum computing’. Uniformity measures how spread out all these items are in the embedding space. You want them to be spread out evenly rather than all clumped together in one corner.
      • Good uniformity means your model can distinguish between different concepts effectively and avoids mapping everything to a small, dense region.

    A good embedding model should be balanced. It needs to bring similar items close together (good alignment) while simultaneously pushing dissimilar items far apart and ensuring the entire space is well-utilized (good uniformity). This allows the model to capture meaningful relationships without sacrificing its ability to distinguish between distinct concepts.

    Ultimately, the ideal balance often depends on your specific application. For some tasks, like semantic search, you might prioritize very strong alignment, while for others, like anomaly detection, a higher degree of uniformity might be more critical.

    This is the code for alignment calculation, which is a mean of the cosine similarities between anchor points and positive matches.

    from sentence_transformers import SentenceTransformer, util
    import numpy as np
    import torch
    
    # --- Alignment Metric for Base Model ---
    base_alignment_scores = []
    
    # Assuming 'train_examples' was defined in a previous cell and contains (anchor, positive, negative) triplets
    for example in train_examples:
        # Encode the anchor and positive texts using the base model
        anchor_embedding_base = model.encode(example.texts[0], convert_to_tensor=True)
        positive_embedding_base = model.encode(example.texts[1], convert_to_tensor=True)
        
        # Calculate cosine similarity between anchor and positive
        score_base = util.cos_sim(anchor_embedding_base, positive_embedding_base).item()
        base_alignment_scores.append(score_base)
    
    average_base_alignment = np.mean(base_alignment_scores)

    And this is the code for Uniformity calculation. It is calculated by first taking a diverse set of embeddings, then computing the cosine similarity between every possible pair of these embeddings, and finally averaging all those pairwise similarity scores.

    # --- Uniformity Metric for Base Model ---
    # Use the same diverse set of texts
    uniformity_embeddings_base = model.encode(uniformity_texts, convert_to_tensor=True)
    
    # Calculate all pairwise cosine similarities
    pairwise_cos_sim_base = util.cos_sim(uniformity_embeddings_base, uniformity_embeddings_base)
    
    # Extract unique pairwise similarities (excluding self-similarity and duplicates)
    upper_triangle_indices_base = torch.triu_indices(pairwise_cos_sim_base.shape[0], pairwise_cos_sim_base.shape[1], offset=1)
    uniformity_similarity_scores_base = pairwise_cos_sim_base[upper_triangle_indices_base[0], upper_triangle_indices_base[1]].cpu().numpy()
    
    # Calculate the average of these pairwise similarities
    average_uniformity_similarity_base = np.mean(uniformity_similarity_scores_base)

    And the results. Given the very limited training data used for fine-tuning (only 3 examples), it’s not surprising that the fine-tuned model doesn’t show a clear improvement over the base model in these specific metrics. 

    The base model kept related items slightly closer together than your fine-tuned model did (higher alignment), and also kept different, unrelated things slightly more spread out or less cluttered than your fine-tuned model (lower uniformity).

    * Base Model:
    Base Model Alignment Score (Avg Cosine Similarity of Positive Pairs): 0.8451
    Base Model Uniformity Score (Avg Pairwise Cos Sim. of Diverse Embeddings): 0.0754
    
    
    * Fine Tuned Model:
    Alignment Score (Average Cosine Similarity of Positive Pairs): 0.8270
    Uniformity Score (Average Pairwise Cosine Similarity of Diverse Embeddings): 0.0777

    Before You Go

    In this article, we learned about embedding models and how they work under the hood, in a practical way.

    These models gained a lot of importance after the surge of AI, being a great engine for RAG applications and fast search.

    Computers must have a way to understand text, and the embeddings are the key. They encode text into vectors of numbers, making it easy for the models to calculate distances and find the best matches.

    Here is my contact, if you liked this content. Find me in my website.

    https://gustavorsantos.me

    Git Hub Code

    https://github.com/gurezende/Studying/tree/master/Python/NLP/Embedding_Models

    References

    [1. Modern NLP: Tokenization, Embedding, and Text Classification] (https://medium.com/data-science-collective/modern-nlp-tokenization-embedding-and-text-classification-448826f489bf?sk=6e5d94086f9636e451717dfd0bf1c03a)

    [2. A Visual Guide to Using BERT for the First Time](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/)

    [3. Qdrant Docs] (https://qdrant.tech/documentation/)

    Embedding Human Language map Meaning Models understand
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    [2510.05145] Efficient Tree-Structured Deep Research with Adaptive Resource Allocation

    March 31, 2026

    [2501.08096] Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving

    March 31, 2026

    Building a Personal AI Agent in a couple of Hours

    March 31, 2026

    Scaling Test-time Physical Memory for Robot Manipulation

    March 31, 2026

    [2512.05658] Multilingual Medical Reasoning for Question Answering with Large Language Models

    March 31, 2026

    How Much Can RAG Systems Gain from Evaluation Secrets?

    March 31, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    What Smart CMOs Do Next

    March 31, 2026

    We’ve all seen it. Brands with healthy websites and excellent content have been watching their…

    Zapier vs. Gumloop: Which is best? [2026]

    March 31, 2026

    How to Submit to BA’s Pantry Awards for 2026

    March 31, 2026

    [2510.05145] Efficient Tree-Structured Deep Research with Adaptive Resource Allocation

    March 31, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Lessons from using the outbox pattern at scale

    March 31, 2026

    April Snack Drop: 6 Snacks We Can’t Wait To Eat This Month

    March 31, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.