Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Machine Learning “Advent Calendar” Day 16: Kernel Trick in Excel
    AI Tools

    The Machine Learning “Advent Calendar” Day 16: Kernel Trick in Excel

    AwaisBy AwaisDecember 17, 2025No Comments8 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Machine Learning “Advent Calendar” Day 16: Kernel Trick in Excel
    Share
    Facebook Twitter LinkedIn Pinterest Email

    article about SVM, the next natural step is Kernel SVM.

    At first sight, it looks like a completely different model. The training happens in the dual form, we stop talking about a slope and an intercept, and suddenly everything is about a “kernel”.

    In today’s article, I will make the word kernel concrete by visualizing what it really does.

    There are many good ways to introduce Kernel SVM. If you have read my previous articles, you know that I like to start from something simple that you already know.

    A classic way to introduce Kernel SVM is this: SVM is a linear model. If the relationship between the features and the target is non-linear, a straight line will not separate the classes well. So we create new features. Polynomial regression is still a linear model, we simply add polynomial features (x, x², x³, …). From this point of view, a polynomial kernel performs polynomial regression implicitly, and an RBF kernel can be seen as using an infinite series of polynomial features…

    Maybe another day we will follow this path, but today we will take a different one: we start with KDE.

    Yes, Kernel Density Estimation.

    Let’s get started.

    And you can use this link to get Google sheet

    Kernel trick in Excel – all images by author

    1. KDE as a sum of individual densities

    I introduced KDE in the article about LDA and QDA, and at that time I said we would reuse it later. This is the moment.

    We see the word kernel in KDE, and we also see it in Kernel SVM. This is not a coincidence, there is a real link.

    The idea of KDE is simple:
    around each data point, we place a small distribution (a kernel).
    Then, we add all these individual densities together to obtain a global distribution.

    Keep this idea in mind. It will be the key to understanding Kernel SVM.

    KDE in Excel – all images by author

    We can also adjust one parameter to control how smooth the global density is, from very local to very smooth, as illustrated in the GIF below.

    KDE in Excel – all images by author

    As you know, KDE is a distance or density-based model, so here, we are going to create a link between two models from two different families.

    2. Turning KDE into a model

    Now we reuse exactly the same idea to build a function around each point, and then this function can be used for classification.

    Do you remember that the classification task with the weight-based models is first a regression task, because the value y is always considered as continuous? We only do the classification part after we got the decision function or f(x).

    2.1. (Still) using a simple dataset

    Someone once asked me why I always use around 10 data points to explain machine learning, saying it is meaningless.

    I strongly disagree.

    If someone cannot explain how a Machine Learning model works with 10 points (or less) and one single feature, then they do not really understand how this model works.

    So this will not be a surprise for you. Yes, I will still use this very simple dataset, that I already used for logistic regression and SVM. I know this dataset is linearly separable, but it is interesting to compare the results of the models.

    And I also generated another dataset with data points that are not linearly separable and visualized how the kernelized model works.

    Dataset for kernel SVM in Excel – all images by author

    2.2. RBF kernel centered on points

    Let us now apply the KDE idea to our dataset.

    For each data point, we place a bell-shaped curve centered on its x value. At this stage, we do not care about classification yet. We are only doing one simple thing: creating one local bell around each point.

    This bell has a Gaussian shape, but here it has a specific name: RBF, for Radial Basis Function.

    On this figure, we can see the RBF (Gaussian) kernel centered on this point x₇

    The name sounds technical, but the idea is actually very simple.

    Once you see RBFs as “distance-based bells”, the name stops being mysterious.

    How to read this intuitively

    • x is any position on the x-axis
    • x₇ is the center of the bell (the 7th point)
    • γ (gamma) controls the width of the bell

    So the bell reaches its maximum exactly at the point.

    As x moves away from x₇, the value decreases smoothly toward 0.

    Role of γ (gamma)

    • Small γ means wide bell (smooth, global influence)
    • Large γ means narrow bell (very local influence)

    So γ plays the same role as the bandwidth in KDE.

    At this stage, nothing is combined yet. We are just building the elementary blocks.

    2.3. Combining bells with class labels

    On the figures below, you first see the individual bells, each centered on a data point.

    Once this is clear, we move to the next step: combining the bells.

    This time, each bell is multiplied by its label yi.
    As a result, some bells are added and others are subtracted, creating influences in two opposite directions.

    This is the first step toward a classification function.

    And we can see all the components from each data point that are adding together in Excel to get the final score.

    This already looks extremely similar to KDE.

    But we are not done yet.

    2.4. From equal bells to weighted bells

    We said earlier that SVM belongs to the weight-based family of models. So the next natural step is to introduce weights.

    In distance-based models, one major limitation is that all features are treated as equally important when computing distances. Of course, we can rescale features, but this is often a manual and imperfect fix.

    Here, we take a different approach.

    Instead of simply summing all the bells, we assign a weight to each data point and multiply each bell by this weight.

    At this point, the model is still linear, but linear in the space of kernels, not in the original input space.

    To make this concrete, we can assume that the coefficients αi are already known and directly plot the resulting function in Excel. Each data point contributes its own weighted bell, and the final score is just the sum of all these contributions.

    If we apply this to a dataset with a non-linearly separable boundary, we clearly see what Kernel SVM is doing: it fits the data by combining local influences, instead of trying to draw a single straight line.

    3. Loss function: where SVM really starts

    Up to now, we have only talked about the kernel part of the model. We have built bells, weighted them, and combined them.

    But our model is called Kernel SVM, not just “kernel model”.

    The SVM part comes from the loss function.

    And as you may already know, SVM is defined by the hinge loss.

    3.1 Hinge loss and support vectors

    The hinge loss has a very important property.

    If a point is:

    • correctly classified, and
    • far enough from the decision boundary,

    then its loss is zero.

    As a direct consequence, its coefficient αi becomes zero.

    Only a few data points remain active.

    These points are called support vectors.

    So even though we started with one bell per data point, in the final model, only a few bells survive.

    In the example below, you can see that for some points (for instance points 5 and 8), the coefficient αi is zero. These points are not support vectors and do not contribute to the decision function.

    Depending on how strongly we penalize violations (through the parameter C), the number of support vectors can increase or decrease.

    This is a crucial practical advantage of SVM.

    When the dataset is large, storing one parameter per data point can be expensive. Thanks to hinge loss, SVM produces a sparse model, where only a small subset of points is kept.

    3.2 Kernel ridge regression: same kernels, different loss

    If we keep the same kernels but replace the hinge loss with a squared loss, we obtain kernel ridge regression:

    Same kernels.
    Same bells.
    Different loss.

    This leads to a very important conclusion:

    Kernels define the representation.
    The loss function defines the model.

    With kernel ridge regression, the model must store all training data points.

    Since squared loss does not force any coefficient to zero, every data point keeps a non-zero weight and contributes to the prediction.

    In contrast, Kernel SVM produces a sparse solution: only support vectors are stored, all other points disappear from the model.

    3.3 A quick link with LASSO

    There is an interesting parallel with LASSO.

    In linear regression, LASSO uses an L1 penalty on the primal coefficients. This penalty encourages sparsity, and some coefficients become exactly zero.

    In SVM, hinge loss plays a similar role, but in a different space.

    • LASSO creates sparsity in the primal coefficients
    • SVM creates sparsity in the dual coefficients αi

    Different mechanisms, same effect: only the important parameters survive.

    Conclusion

    Kernel SVM is not just about kernels.

    • Kernels build a rich, non-linear representation.
    • Hinge loss selects only the essential data points.

    The result is a model that is both flexible and sparse, which is why SVM remains a powerful and elegant tool.

    Tomorrow, we will look at another model that deals with non-linearity. Stay tuned.

    Advent Calendar Day Excel Kernel Learning Machine Trick
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Escaping the SQL Jungle | Towards Data Science

    March 21, 2026

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    How to add Google Calendar to Outlook

    March 21, 2026

    Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

    March 21, 2026

    Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

    March 21, 2026

    How to Measure AI Value

    March 20, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Escaping the SQL Jungle | Towards Data Science

    March 21, 2026

    don’t collapse overnight. They grow slowly, query by query. “What breaks when I change a…

    SEO’s new battleground: Winning the consensus layer

    March 21, 2026

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    23 Radish Recipes for Salads, Pickles, and More

    March 21, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Google confirms AI headline rewrites test in Search results

    March 21, 2026

    How to add Google Calendar to Outlook

    March 21, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.