you work with Artificial Intelligence development, if you are studying, or planning to work with that technology, you certainly stumbled upon embedding models along your journey.
At its heart, an embedding model is a neural network trained to map like words or sentences into a continuous vector space, with the goal of approximating mathematically those objects that are contextually or conceptually similar.
Putting it in simpler words, imagine a library where the books are not categorized only by author and title, but by many other dimensions, such as vibe, topic, mood, writing style, etc.
Another good analogy is a map itself. Think of a map and two cities you don’t know. Let’s say you are not that good with Geography and don’t know where Tokyo and New York City are in the map. If I tell you that we should have breakfast in NYC and lunch in Tokyo, you could say: “Let’s do it”.
However, once I give you the coordinates for you to check the cities on the map, you will see they are very far away from each other. That is like giving the embeddings to a model: they are the coordinates!
Building the Map
Even before you ever ask a question, the embedding model was trained. It has read millions of sentences and noted patterns. For example, it sees that “cat” and “kitten” often appear in the same kinds of sentences, while “cat” and “refrigerator” rarely do.
With those patterns, the model assigns every word a set of coordinates on a mathematical space, like an invisible map.
- Concepts that are similar (like “cat” and “kitten”) get placed right next to each other on the map.
- Concepts that are somewhat related (like “cat” and “dog”) are placed near each other, but not right on top of one another.
- Concepts that are totally unrelated (like “cat” and “quantum physics”) are placed in completely different corners of the map, like NYC and Tokyo.
The Digital Fingerprint
Nice. Now we know how the map was created. What comes next?
Now we will work with this trained embedding model. Once we give the model a sentence like “The fluffy kitten is sleeping”:
- It doesn’t look at the letters. Instead, it visits those coordinates on its map for each word.
- It calculates the center point (the average) of all those locations. That single center point becomes the “fingerprint” for the whole sentence.
- It puts a pin on the map where your question’s fingerprint is
- Looks around in a circle to see which other fingerprints are nearby.
Any documents that “live” near your question on this map are considered a match, because they share the same “vibe” or topic, even if they don’t share the exact same words.

It’s like searching for a book not by searching for a specific keyword, but by pointing to a spot on a map that says “these are all books about kittens,” and letting the model fetch everything in that neighborhood.
Embedding Models Steps
Let’s see next how an embedding model works step-by-step after getting a request.
- Computer takes in a text.
- Breaks it down into tokens, which is the smallest piece of a phrase with meaning. Usually, that’s a word or a part of the word.
- Chunking: The input text is split into manageable chunks (often around 512 tokens), so it doesn’t get overwhelmed by too much information at once.
- Embedding: It transforms each snippet into a long list of numbers (a vector) that acts like a unique fingerprint representing the meaning of that text.
- Vector Search: When you ask a question, the model turns your question into a “fingerprint” too and quickly calculates which stored snippets have the most mathematically similar numbers.
- Model returns the most similar vectors, which are associated with text chunks.
- Generation: If you are performing a Retrieval-Augmented Generation (RAG), the model hands those few “winning” snippets to an AI (like a LLM) which reads them and writes out a natural-sounding answer based only on that specific information.
Coding
Great. We did a lot of talking. Now, let’s try to code a little and get those concepts more practical.
We will start with a simple BERT (Bidirectional Encoder Representations from Transformers) embedding. It was created by Google and uses the Transformer architecture and its attention mechanism. The vector for a word changes based on the words surrounding it.
# Imports
from transformers import BertTokenizer
# Load pre-trained BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
# Sample text for tokenization
text = "Embedding models are so cool!"
# Step 1: Tokenize the text
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# View
tokens{'input_ids': tensor([[ 101, 7861, 8270, 4667, 4275, 2024, 2061, 4658, 999, 102]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}Notice how each word was transformed into an id. Since we have only 5 words, some of them might have been broken down into two subwords.
- The ID 101 is associated with the token [CLS]. That token’s vector is thought to capture the overall meaning or information of the entire sentence or sequence of sentences. It is like a stamp that indicates to the LLMs the meaning of that chunk. [2]
- The ID 102 is associated with the token [SEP] to separate sentences. [2]
Next, let’s apply the embedding model to data.
Embedding
Here is another simple snippet where we get some text and encode it with the versatille and all-purpose embedding model all-MiniLM-L6-v2.
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
# 1. Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')
# 2. Initialize Qdrant client
client = QdrantClient(":memory:")
# 3. Create embeddings
docs = ["refund policy", "pricing details", "account cancellation"]
vectors = model.encode(docs).tolist()
# 4. Store Vectors: Create a collection (DB)
client.create_collection(
collection_name="my_collection",
vectors_config = models.VectorParams(size=384,
distance= models.Distance.COSINE)
)
# Upload embedded docs (vectors)
client.upload_collection(collection_name="my_collection",
vectors= vectors,
payload= [{"source": docs[i]} for i in range(len(docs))])
# 5. Search
query_vector = model.encode("How do I cancel my subscription")
# Result
result = client.query_points(collection_name= 'my_collection',
query= query_vector,
limit=2,
with_payload=True)
print("\n\n ======= RESULTS =========")
result.points
The results are as expected. It points to the account cancellation topic!
======= RESULTS =========
[ScoredPoint(id='b9f4aa86-4817-4f85-b26f-0149306f24eb', version=0, score=0.6616353073200185, payload={'source': 'account cancellation'}, vector=None, shard_key=None, order_value=None),
ScoredPoint(id='190eaac1-b890-427b-bb4d-17d46eaffb25', version=0, score=0.2760082702501182, payload={'source': 'refund policy'}, vector=None, shard_key=None, order_value=None)]What just happened?
- We imported a pre-trained embedding model
- Instantiated a vector database of our choice: Qdrant [3].
- Embedded the text and uploaded it to the vector DB in a new collection.
- We submitted a query.
- The results are those documents with the closest mathematical “fingerprint”, or meaning to the query’s embeddings.
This is really nice.
To end this article, I wonder if we can try to fine tune an embedding model. Let’s try.
Fine Tuning an Embedding Model
Fine-tuning an embedding model is different from fine-tuning an LLM. Instead of teaching the model to “talk,” you are teaching it to reorganize its internal map so that specific concepts in your domain are pushed further apart or pulled closer together.
The most common and effective way to do this is using Contrastive Learning with a library like Sentence-Transformers.
First, teach the model what closeness looks like using three data points.
- Anchor: The reference item (e.g., “Brand A Cola Soda”)
- Positive: A similar item (e.g., “Brand B Cola Soda”) that model should pull together.
- Negative: A different item (e.g., “Brand A Cola Soda Zero Sugar”) that the model should push away.
Next, we choose a Loss Function to tell the model how much to change when it makes a mistake. You can choose between:
- MultipleNegativesRankingLoss: Great if you only have (Anchor, Positive) pairs. It assumes every other positive in the batch is a “negative” for the current anchor.
- TripletLoss: Best if you have explicit (Anchor, Positive, Negative) sets. It forces the distance between Anchor-Positive to be smaller than Anchor-Negative by a specific margin.
This is the model similarity results out-of-the-box.
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util
# 1. Load a pre-trained base model
model = SentenceTransformer('all-MiniLM-L6-v2')
# 1. Define your test cases
query = "Brand A Cola Soda"
choices = [
"Brand B Cola Soda", # The 'Positive' (Should be closer now)
"Brand A Cola Soda Zero Sugar" # The 'Negative' (Should be further away now)
]
# 2. Encode the text into vectors
query_vec = model.encode(query)
choice_vecs = model.encode(choices)
# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to a list for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
print(f"\n\n ======= Results for: {query} ===============")
for i, score in enumerate(cos_scores):
print(f"-> {choices[i]}: {score:.5f}") ======= Results for: Brand A Cola Soda ===============
-> Brand B Cola Soda: 0.86003
-> Brand A Cola Soda Zero Sugar: 0.81907And when we try to fine tune it, showing this model that the Cola Sodas should be closer than the Zero Sugar version, this is what happens.
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util
# 1. Load a pre-trained base model
fine_tuned_model = SentenceTransformer('all-MiniLM-L6-v2')
# 2. Define your training data (Anchors, Positives, and Negatives)
train_examples = [
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand C Cola Zero Sugar"]),
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand A Cola Zero Sugar"]),
InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand B Cola Zero Sugar"])
]
# 3. Create a DataLoader and choose a Loss Function
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.TripletLoss(model=fine_tuned_model)
# 4. Tune the model
fine_tuned_model.fit(train_objectives=[(train_dataloader, train_loss)],
optimizer_params={'lr': 9e-5},
epochs=40)
# 1. Define your test cases
query = "Brand A Cola Soda"
choices = [
"Brand B Cola Soda", # The 'Positive' (Should be closer now)
"Brand A Cola Zero Sugar" # The 'Negative' (Should be further away now)
]
# 2. Encode the text into vectors
query_vec = fine_tuned_model.encode(query)
choice_vecs = fine_tuned_model.encode(choices)
# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to a list for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()
print(f"\n\n ======== Results for: {query} ====================")
for i, score in enumerate(cos_scores):
print(f"-> {choices[i]}: {score:.5f}") ======== Results for: Brand A Cola Soda ====================
-> Brand B Cola Soda: 0.86247
-> Brand A Cola Zero Sugar: 0.75732Here, we didn’t get a much better result. This model is trained over a very large amount of data, so this fine tuning with a small example was not enough to make it work the way we expected.
But still, this is a great learning. We were able to make the model iapproximate both Cola Soda examples, but that also brought closer the Zero Cola Soda.
Alignment and Uniformity
A good way of checking how the model was updated is looking at these metrics
- Alignment: Imagine you have a bunch of related items, like ‘Brand A Cola Soda’ and ‘Cola Soda’. Alignment measures how close these related items are to each other in the embedding space.
- A high alignment score means that your model is good at placing similar things close together, which is generally what you want for tasks like searching for similar products.
- Uniformity: Now imagine all your different items, from ‘refund policy’ to ‘Quantum computing’. Uniformity measures how spread out all these items are in the embedding space. You want them to be spread out evenly rather than all clumped together in one corner.
- Good uniformity means your model can distinguish between different concepts effectively and avoids mapping everything to a small, dense region.
A good embedding model should be balanced. It needs to bring similar items close together (good alignment) while simultaneously pushing dissimilar items far apart and ensuring the entire space is well-utilized (good uniformity). This allows the model to capture meaningful relationships without sacrificing its ability to distinguish between distinct concepts.
Ultimately, the ideal balance often depends on your specific application. For some tasks, like semantic search, you might prioritize very strong alignment, while for others, like anomaly detection, a higher degree of uniformity might be more critical.
This is the code for alignment calculation, which is a mean of the cosine similarities between anchor points and positive matches.
from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch
# --- Alignment Metric for Base Model ---
base_alignment_scores = []
# Assuming 'train_examples' was defined in a previous cell and contains (anchor, positive, negative) triplets
for example in train_examples:
# Encode the anchor and positive texts using the base model
anchor_embedding_base = model.encode(example.texts[0], convert_to_tensor=True)
positive_embedding_base = model.encode(example.texts[1], convert_to_tensor=True)
# Calculate cosine similarity between anchor and positive
score_base = util.cos_sim(anchor_embedding_base, positive_embedding_base).item()
base_alignment_scores.append(score_base)
average_base_alignment = np.mean(base_alignment_scores)And this is the code for Uniformity calculation. It is calculated by first taking a diverse set of embeddings, then computing the cosine similarity between every possible pair of these embeddings, and finally averaging all those pairwise similarity scores.
# --- Uniformity Metric for Base Model ---
# Use the same diverse set of texts
uniformity_embeddings_base = model.encode(uniformity_texts, convert_to_tensor=True)
# Calculate all pairwise cosine similarities
pairwise_cos_sim_base = util.cos_sim(uniformity_embeddings_base, uniformity_embeddings_base)
# Extract unique pairwise similarities (excluding self-similarity and duplicates)
upper_triangle_indices_base = torch.triu_indices(pairwise_cos_sim_base.shape[0], pairwise_cos_sim_base.shape[1], offset=1)
uniformity_similarity_scores_base = pairwise_cos_sim_base[upper_triangle_indices_base[0], upper_triangle_indices_base[1]].cpu().numpy()
# Calculate the average of these pairwise similarities
average_uniformity_similarity_base = np.mean(uniformity_similarity_scores_base)And the results. Given the very limited training data used for fine-tuning (only 3 examples), it’s not surprising that the fine-tuned model doesn’t show a clear improvement over the base model in these specific metrics.
The base model kept related items slightly closer together than your fine-tuned model did (higher alignment), and also kept different, unrelated things slightly more spread out or less cluttered than your fine-tuned model (lower uniformity).
* Base Model:
Base Model Alignment Score (Avg Cosine Similarity of Positive Pairs): 0.8451
Base Model Uniformity Score (Avg Pairwise Cos Sim. of Diverse Embeddings): 0.0754
* Fine Tuned Model:
Alignment Score (Average Cosine Similarity of Positive Pairs): 0.8270
Uniformity Score (Average Pairwise Cosine Similarity of Diverse Embeddings): 0.0777Before You Go
In this article, we learned about embedding models and how they work under the hood, in a practical way.
These models gained a lot of importance after the surge of AI, being a great engine for RAG applications and fast search.
Computers must have a way to understand text, and the embeddings are the key. They encode text into vectors of numbers, making it easy for the models to calculate distances and find the best matches.
Here is my contact, if you liked this content. Find me in my website.
Git Hub Code
https://github.com/gurezende/Studying/tree/master/Python/NLP/Embedding_Models
References
[1. Modern NLP: Tokenization, Embedding, and Text Classification] (https://medium.com/data-science-collective/modern-nlp-tokenization-embedding-and-text-classification-448826f489bf?sk=6e5d94086f9636e451717dfd0bf1c03a)
[2. A Visual Guide to Using BERT for the First Time](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/)
[3. Qdrant Docs] (https://qdrant.tech/documentation/)


