Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Machine Learning “Advent Calendar” Day 5: GMM in Excel
    AI Tools

    The Machine Learning “Advent Calendar” Day 5: GMM in Excel

    AwaisBy AwaisDecember 6, 2025No Comments6 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Machine Learning “Advent Calendar” Day 5: GMM in Excel
    Share
    Facebook Twitter LinkedIn Pinterest Email

    In the previous article, we explored distance-based clustering with K-Means.

    further: to improve how the distance can be measured we add variance, in order to get the Mahalanobis distance.

    So, if k-Means is the unsupervised version of the Nearest Centroid classifier, then the natural question is:

    What is the unsupervised version of QDA?

    This means that like QDA, each cluster now has to be described not only by its mean, but also by its variance (and we also have to add covariance if the number of features is higher than 2). But here everything is learned without labels.

    So you see the idea, right?

    And well, the name of this model is the Gaussian Mixture Model (GMM)…

    GMM and the names of these models…

    As it is often the case, the names of the models come from historical reasons. They are not always designed to highlight the connections between models, if they are not found together.

    Different researchers, different periods, different use cases… and we end up with names that sometimes hide the true structure behind the ideas.

    Here, the name “Gaussian Mixture Model” simply means that the data is represented as a mixture of several Gaussian distributions.

    If we follow the same naming logic as k-Means, it would have been clearer to call it something like k-Gaussian Mixture

    Because, in practice, instead of only using the means, we add the variance. And we could just use the Mahalanobis distance, or another weighted distance using both means and variance. But Gaussian distribution gives us probabilities that are easier to interpret.

    So we choose a number k of Gaussian components.

    And by the way, GMM is not the only one.

    In fact, the entire machine learning framework is actually much more recent than many of the models it contains. Most of these techniques were originally developed in statistics, signal processing, econometrics, or pattern recognition.

    Then, much later, the field we now call “machine learning” emerged and regrouped all these models under one umbrella. But the names did not change.

    So today we use a mixture of vocabularies coming from different eras, different communities, and different intentions.

    This is why the relationships between models are not always obvious when you look only at the names.

    If we had to rename everything with a modern, unified machine-learning style, the landscape would actually be much clearer:

    • GMM would become k-Gaussian Clustering
    • QDA would become Nearest Gaussian Classifier
    • LDA, well, Nearest Gaussian Classifier with the same variance across classes.

    And suddenly, all the links appear:

    • k-Means ↔ Nearest Centroid
    • GMM ↔ Nearest Gaussian (QDA)

    This is why GMM is so natural after K-Means. If K-Means groups points by their closest centroid, then GMM groups them by their closest Gaussian shape.

    Why this entire section to discuss the names?

    Well, the truth is that, since we already covered the k-means algorithm, and we already did the transition from Nearest Centroids Classifier to QDA, we already know all about this algorithm, and the training algorithm will not change…

    And what is the NAME of this training algorithm?

    Oh, Lloyd’s algorithm.

    Actually, before k-means was called so, it was simply known as Lloyd’s algorithm, published by Stuart Lloyd in 1957. Only later, the machine learning community changed it to “k-means”.

    And this algorithm manipulated only the means, so we need another name, right?

    You see where this is going: the Expectation-Maximizing algorithm!

    EM is simply the general form of Lloyd’s idea. Lloyd updates the means, EM updates everything: means, variances, weights, and probabilities.

    So, you already know everything about GMM!

    But since my article is called “GMM in Excel”, I cannot end my article here…

    GMM in 1 Dimension

    Let us start with this simple dataset, the same we used for k-means: 1, 2, 3, 11, 12, 13

    Hmm, the two Gaussians will have the same variances. So think about playing with other numbers in Excel!

    And we naturally want 2 clusters.

    Here are the different steps.

    Initialization

    We start with guesses for means, variances, and weights.

    GMM in Excel – initialization step- image by author

    Expectation step (E-step)

    For each point, we compute how likely it is to belong to each Gaussian.

    GMM in Excel – expectation step – image by author

    Maximization step (M-step)

    Using these probabilities, we update the means, variances, and weights.

    GMM in Excel – maximization step – image by author

    Iteration

    We repeat E-step and M-step until the parameters stabilise.

    GMM in Excel -iterations – image by author

    Each step is extremely simple once the formulas are visible.
    You will see that EM is nothing more than updating averages, variances, and probabilities.

    We can also do some visualization to see how the Gaussian curves move during the iterations.

    At the beginning, the two Gaussian curves overlap heavily because the initial means and variances are just guesses.

    The curves slowly separate, adjust their widths, and finally settle exactly on the two groups of points.

    By plotting the Gaussian curves at each iteration, you can literally watch the model learn:

    • the means slide toward the centers of the data
    • the variances shrink to match the spread of each group
    • the overlap disappears
    • the final shapes match the structure of the dataset

    This visual evolution is extremely helpful for intuition. Once you see the curves move, EM is no longer an abstract algorithm. It becomes a dynamic process you can follow step by step.

    GMM in Excel – image by author

    GMM in 2 Dimensions

    The logic is exactly the same as in 1D. Nothing new conceptually. We simply extend the formulas…

    Instead of having one feature per point, we now have two.

    Each Gaussian must now learn:

    • a mean for x1
    • a mean for x2
    • a variance for x1
    • a variance for x2
    • AND a covariance term between the two features.

    Once you write the formulas in Excel, you will see that the process stays exactly the same:

    Well, the truth is that if you look at the screenshot, you might think: “Wow, the formula is so long!” And this is not all of it.

    2D GMM in Excel – image by author

    But do not be fooled. The formula is long only because we write out the 2-dimensional Gaussian density explicitly:

    • one part for the distance in x1
    • one part for the distance in x2
    • the covariance term
    • the normalization constant

    Nothing more.

    It is simply the density formula expanded cell by cell.
    Long to type, but perfectly understandable once you see the structure: a weighted distance, inside an exponential, divided by the determinant.

    So yes, the formula looks big… but the idea behind it is extremely simple.

    Conclusion

    K-Means gives hard boundaries.

    GMM gives probabilities.

    Once the EM formulas are written in Excel, the model becomes simple to follow: the means move, the variances adjust, and the Gaussians naturally settle around the data.

    GMM is just the next logical step after k-Means, offering a more flexible way to represent clusters and their shapes.

    Advent Calendar Day Excel GMM Learning Machine
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    How to add Google Calendar to Outlook

    March 21, 2026

    Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

    March 21, 2026

    Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

    March 21, 2026

    How to Measure AI Value

    March 20, 2026

    What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

    March 20, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    problem, the goal is to find the best (maximum or minimum) value of an objective…

    23 Radish Recipes for Salads, Pickles, and More

    March 21, 2026

    Bots could overtake human web usage by 2027

    March 21, 2026

    How to create a Zoom meeting link and share it

    March 21, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    The Best New Cookbooks of Spring 2026

    March 21, 2026

    Google Business Profile tests AI-generated replies to reviews

    March 21, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.