Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel
    AI Tools

    The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

    AwaisBy AwaisDecember 23, 2025No Comments8 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel
    Share
    Facebook Twitter LinkedIn Pinterest Email

    of this series, we will talk about deep learning.

    And when people talk about deep learning, we immediately think of these images of deep neural networks architectures, with many layers, neurons, and parameters.

    In practice, the real shift introduced by deep learning is elsewhere.

    It is about learning data representations.

    In this article, we focus on text embeddings, explain their role in the machine learning landscape, and show how they can be understood and explored in Excel.

    1. Classic Machine earning vs. Deep learning

    We will discuss, in this part, why embedding is introduced.

    1.1 Where does deep learning fit?

    To understand embeddings, we first need to clarify the place of deep learning.

    We will use the term classic machine learning to describe methods that do not rely on deep architectures.

    All the previous articles deal with classic machine learning, that can be described in two complementary ways.

    Learning paradigms

    • Supervised learning
    • Unsupervised learning

    Model families

    • Distance-based models
    • Tree-based models
    • Weight-based models

    Across this series, we have already studied the learning algorithms behind these models. In particular, we have seen that gradient descent applies to all weight-based models, from linear regression to neural networks.

    Deep learning is often reduced to neural networks with many layers.

    But this explanation is incomplete.

    From an optimization point of view, deep learning does not introduce a new learning rule.

    So what does it introduce?

    1.2 Deep learning as data representation learning

    Deep learning is about how features are created.

    Instead of manually designing features, deep learning learns representations automatically, often through multiple successive transformations.

    This also raises an important conceptual question:

    Where is the boundary between feature engineering and model learning?

    Some examples make this clearer:

    • Polynomial regression is still a linear model, but the features are polynomial
    • Kernel methods project data into a high-dimensional feature space
    • Density-based methods implicitly transform the data before learning

    Deep learning continues this idea, but at scale.

    From this perspective, deep learning belongs to:

    • the feature engineering philosophy, for representation
    • the weight-based model family, for learning

    1.3 Images and convolutional neural networks

    Images are represented as pixels.

    From a technical point of view, image data is already numerical and structured: a grid of numbers. However, the information contained in these pixels is not structured in a way that classical models can easily exploit.

    Pixels do not explicitly encode: edges, shapes, textures, or objects.

    Convolutional Neural Networks (CNNs) are designed to create information from pixels. They apply filters to detect local patterns, then progressively combine them into higher-level representations.

    I have published a this article showing how CNNs can be implemented in Excel to make this process explicit.

    CNN in Excel – all images by author

    For images, the challenge is not to make the data numerical, but to extract meaningful representations from already numerical data.

    1.4 Text data: a different problem

    Text presents a fundamentally different challenge.

    Unlike images, text is not numerical by nature.

    Before modeling context or order, the first problem is more basic:

    How do we represent words numerically?

    Creating a numerical representation for text the first step.

    In deep learning for text, this step is handled by embeddings.

    Embeddings transform discrete symbols (words) into vectors that models can work with. Once embeddings exist, we can then model: context, order and relationships between words.

    In this article, we focus on this first and essential step:
    how embeddings create numerical representations for text, and how this process can be explored in Excel.

    2. Two ways to learn text embeddings

    In this article, we will use the IMDB movie reviews dataset to illustrate both approaches. The dataset is distributed under the Apache License 2.0.

    There are two main ways to learn embeddings for text, and we will do both with this dataset:

    • supervised: we will create embeddings to predict the sentiment
    • unsupervised or self-supervised: we will use word2vec algorithm

    In both cases, the goal is the same:
    to transform words into numerical vectors that can be used by machine learning models.

    Before comparing these two approaches, we first need to clarify what embeddings are and how they relate to classic machine learning.

    IMDB dataset image by author –Apache License 2.0

    2.1 Embeddings and classic machine learning

    In classic machine learning, categorical data is usually handled with:

    • label encoding, which assigns fixed integers but introduces artificial order
    • one-hot encoding, which removes order but produces high-dimensional sparse vectors

    How they can be used depend on the nature of the models.

    Distance-based models cannot effectively use one-hot encoding, because all categories end up being equally distant from each other. Label encoding could work only if we can attribute meaningful numerical values for the categories, which is generally not the case in classic models.

    Weight-based models can use one-hot encoding, because the model learns a weight for each category. In contrast, with label encoding, the numerical values are fixed and cannot be adjusted to represent meaningful relationships.

    Tree-based models treat all variables as categorical splits rather than numerical magnitudes, which makes label encoding acceptable in practice. However, most implementations, including scikit-learn, still require numerical inputs. As a result, categories must be converted to numbers, either through label encoding or one-hot encoding. If the numerical values carried semantic meaning, this would again be beneficial.

    Overall, this highlights a limitation of classic approaches:
    category values are fixed and not learned.

    Embeddings extend this idea by learning the representation itself.
    Each word is associated with a trainable vector, turning the representation of categories into a learning problem rather than a preprocessing step.

    2.2 Supervised embeddings

    In supervised learning, embeddings are learned as part of a prediction task.

    For example, the IMDB dataset has labels about the in sentiment analysis. So we can create a very simple architecture:

    In our case, we can use a very simple architecture: each word is mapped to a one-dimensional embedding

    This is possible because the objective is binary sentiment classification.

    Once training is complete, we can export the embeddings and explore them in Excel.

    When plotting the embeddings on the x-axis and word frequency on the y-axis, a clear pattern appears:

    • positive values are associated with words such as excellent or wonderful,
    • negative values are associated with words such as worst or waste

    Depending on the initialization, the sign can be inverted, since the logistic regression layer also has parameters that influence the final prediction.

    Finally, in Excel, we reconstruct the full pipeline that corresponds to the architecture we define early.

    Input column
    The input text (a review) is cut into words, and each row corresponds to one word.

    Embedding search
    Using a lookup function, the embedding value associated with each word is retrieved from the embedding table learned during training.

    Global average
    The global average embedding is computed by averaging the embeddings of all words seen so far. This corresponds to a very simple sentence representation: the mean of word vectors.

    Probability prediction
    The averaged embedding is then passed through a logistic function to produce a sentiment probability.

    What we observe

    • Words with strongly positive embeddings (for example excellent, love, fun) push the average upward.
    • Words with strongly negative embeddings (for example worst, horrible, waste) pull the average downward.
    • Neutral or weakly weighted words have little influence.

    As more words are added, the global average embedding stabilizes, and the sentiment prediction becomes more confident.

    2.3 Word2Vec: embeddings from co-occurrence

    In Word2Vec, similarity does not mean that two words have the same meaning.
    It means that they appear in similar contexts.

    Word2Vec learns word embeddings by looking at which words tend to co-occur within a fixed window in the text. Two words are considered similar if they often appear around the same neighboring words, even if their meanings are opposite.

    As shown in the Excel sheet below, we compute the cosine similarity for the word good and retrieve the most similar words.

    From the model’s perspective, the surrounding words are almost identical. The only thing that changes is the adjective itself.

    As a result, Word2Vec learns that “good” and “bad” play a similar role in language, even though their meanings are opposite.

    So, Word2Vec captures distributional similarity, not semantic polarity.

    A useful way to think about it is:

    Words are close if they are used in the same places.

    2.4 How embeddings are used

    In modern systems such as RAG (Retrieval-Augmented Generation), embeddings are often used to retrieve documents or passages for question answering.

    However, this approach has limitations.

    Most commonly used embeddings are trained in a self-supervised way, based on co-occurrence or contextual prediction objectives. As a result, they capture general language similarity, not task-specific meaning.

    This means that:

    • embeddings may retrieve text that is linguistically similar but not relevant
    • semantic proximity does not guarantee answer correctness

    Other embedding strategies can be used, including task-adapted or supervised embeddings, but they often remain self-supervised at their core.

    Understanding how embeddings are created, what they encode, and what they do not encode is therefore essential before using them in downstream systems such as RAG.

    Conclusion

    Embeddings are learned numerical representations of words that make similarity measurable.

    Whether learned through supervision or through co-occurrence, embeddings map words to vectors based on how they are used in data. By exporting them to Excel, we can inspect these representations directly, compute similarities, and understand what they capture and what they do not.

    This makes embeddings less mysterious and clarifies their role as a foundation for more complex systems such as retrieval or RAG.

    Advent Calendar Day Embeddings Excel Learning Machine
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Bridging Modality Gap with Temporal Evolution Semantic Space

    March 18, 2026

    How to Effectively Review Claude Code Output

    March 18, 2026

    Everything You Need to Know About Recursive Language Models

    March 17, 2026

    [2601.15871] Why Inference in Large Models Becomes Decomposable After Training

    March 17, 2026

    Self-Hosting Your First LLM | Towards Data Science

    March 17, 2026

    To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

    March 17, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Search Referral Traffic Down 60% For Small Publishers, Data Shows

    March 18, 2026

    Search referral traffic to small publishers dropped 60% over two years, according to Chartbeat data…

    Bridging Modality Gap with Temporal Evolution Semantic Space

    March 18, 2026

    How to Effectively Review Claude Code Output

    March 18, 2026

    Google adds video visibility to Performance Max reporting

    March 18, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    The State of Social Media 2026

    March 17, 2026

    [2601.15871] Why Inference in Large Models Becomes Decomposable After Training

    March 17, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.