Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel
    AI Tools

    The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

    AwaisBy AwaisDecember 22, 2025No Comments10 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel
    Share
    Facebook Twitter LinkedIn Pinterest Email

    , we ensemble learning with voting, bagging and Random Forest.

    Voting itself is only an aggregation mechanism. It does not create diversity, but combines predictions from already different models.
    Bagging, on the other hand, explicitly creates diversity by training the same base model on multiple bootstrapped versions of the training dataset.

    Random Forest extends bagging by additionally restricting the set of features considered at each split.

    From a statistical point of view, the idea is simple and intuitive: diversity is created through randomness, without introducing any fundamentally new modeling concept.

    But ensemble learning does not stop there.

    There exists another family of ensemble methods that does not rely on randomness at all, but on optimization. Gradient Boosting belongs to this family. And to truly understand it, we will start with a deliberately strange idea:

    We will apply Gradient Boosting to Linear Regression.

    Yes, I know. This is probably the first time you have heard about applying Gradient Boosted Linear Regression.

    (We will see Gradient Boosted Decision Trees, tomorrow).

    In this article, here is the plan:

    • First, we will step back and revisit the three fundamental steps of machine learning.
    • Then, we will introduce the Gradient Boosting algorithm.
    • Next, we will apply Gradient Boosting to linear regression.
    • Finally, we will reflect on the relationship between Gradient Boosting and Gradient Descent.

    1. Machine Learning in Three steps

    To make machine learning easier to learn, I always separate it into three clear steps. Let us apply this framework to Gradient Boosted Linear Regression.

    Because unlike bagging, each step reveals something interesting.

    Trois learning steps in Machine Learning – all images by author

    1. Model

    A model is something that takes input features and produces an output prediction.

    In this article, the base model will be Linear Regression.

    1 bis. Ensemble Method Model

    Gradient Boosting is not a model itself. It is an ensemble method that aggregates several base models into a single meta-model. On its own, it does not map inputs to outputs. It must be applied to a base model.

    Here, Gradient Boosting will be used to aggregate linear regression models.

    2. Model fitting

    Each base model must be fitted to the training data.

    For Linear Regression, fitting means estimating the coefficients. This can be done numerically using Gradient Descent, but also analytically. In Google Sheets or Excel, we can directly use the LINEST function to estimate these coefficients.

    2 bis. Ensemble model learning

    At first, Gradient Boosting may look like a simple aggregation of models. But it is still a learning process. As we will see, it relies on a loss function, exactly like classical models that learn weights.

    3. Model tuning

    Model tuning consists of optimizing hyperparameters.

    In our case, the base model Linear Regression itself has no hyperparameters (unless we use regularized variants such as Ridge or Lasso).

    Gradient Boosting, however, introduces two important hyperparameters: the number of boosting steps and the learning rate. We will see this in the next section.

    In a nutshell, that is machine learning, made easy, in three steps!

    2. Gradient Boosting Regressor algorithm

    2.1 Algorithm principle

    Here are the main steps of the Gradient Boosting algorithm, applied to regression.

    1. Initialization: We start with a very simple model. For regression, this is usually the average value of the target variable.
    2. Residual Errors Calculation: We compute residuals, defined as the difference between the actual values and the current predictions.
    3. Fitting Linear Regression to Residuals: We fit a new base model (here, a linear regression) to these residuals.
    4. Update the ensemble : We add this new model to the ensemble, scaled by a learning rate (also called shrinkage).
    5. Repeating the process: We repeat steps 2 to 4 until we reach the desired number of boosting iterations or until the error converges.

    That’s it! This is the basic procedure for performing a Gradient Boosting applied to Linear Regression.

    2.2 Algorithm expressed with formulas

    Now we can write the formulas explicitly, it helps make each step concrete.

    Step 1 – Initialization
    We start with a constant model equal to the average of the target variable:
    f0 = average(y)

    Step 2 – Residual computation
    We compute the residuals, defined as the difference between the actual values and the current predictions:
    r1 = y − f0

    Step 3 – Fit a base model to the residuals
    We fit a linear regression model to these residuals:
    r̂1 = a0 · x + b0

    Step 4 – Update the ensemble
    We update the model by adding the fitted regression, scaled by the learning rate:
    f1 = f0 − learning_rate · (a0 · x + b0)

    Next iteration
    We repeat the same procedure:
    r2 = y − f1
    r̂2 = a1 · x + b1
    f2 = f1 − learning_rate · (a1 · x + b1)

    By expanding this expression, we obtain:
    f2 = f0 − learning_rate · (a0 · x + b0) − learning_rate · (a1 · x + b1)

    The same process continues at each iteration. Residuals are recomputed, a new model is fitted, and the ensemble is updated by adding this model with a learning rate.

    This formulation makes it clear that Gradient Boosting builds the final model as a sum of successive correction models.

    3. Gradient Boosted Linear Regression

    3.1 Base model training

    We start with a simple linear regression as our base model, using a small dataset of ten observations that I generated.

    For the fitting of the base model, we will use a function in Google Sheet (it also works in Excel): LINEST to estimate the coefficients of the linear regression.

    Gradient Boosted Linear Regression Simple dataset with linear regression — Image by author

    3.2 Gradient Boosting algorithm

    The implementation of these formulas is straightforward in Google Sheet or Excel.

    The table below shows the training dataset along with the different steps of the gradient boosting steps:

    Gradient Boosted Linear Regression with all steps in Excel — Image by author

    For each fitting step, we use the Excel function LINEST:

    Gradient Boosted Linear Regression with formula for coefficient estimation — Image by author

    We will only do 2 iterations, and we can guess how it goes for more iterations. Here below is a graphic to show the models at each iteration. The different shades of red illustrate the convergence of the model and we also show the final model that is directly found with gradient descent applied directly to y.

    Gradient Boosted Linear Regression — Image by author

    3.3 Why Boosting Linear Regression is purely pedagogical

    If you look carefully at the algorithm, two important observations emerge.

    First, in step 2, we fit a linear regression to residuals, it will take time and algorithmic steps to achieve the model fitting steps, instead of fitting a linear regression to residuals, we can directly fit a linear regression to the actual values of y, and we already would find the final optimal model!

    Secondly, when adding a linear regression to another linear regression, it is still a linear regression.

    For example, we can rewrite f2 as:

    f2 = f0 - learning_rate *(b0+b1) - learning_rate * (a0+a1) x

    This is still a linear function of x.

    This explains why Gradient Boosted Linear Regression does not bring any practical benefit. Its value is purely pedagogical: it helps us understand how the Gradient Boosting algorithm works, but it does not improve predictive performance.

    In fact, it is even less useful than bagging applied to linear regression. With bagging, the variability between bootstrapped models allows us to estimate prediction uncertainty and construct confidence intervals. Gradient Boosted Linear Regression, on the other hand, collapses back to a single linear model and provides no additional information about uncertainty.

    As we will see tomorrow, the situation is very different when the base model is a decision tree.

    3.4 Tuning hyperparameters

    There are two hyperparameters we can tune: the number of iterations and the learning rate.

    For the number of iterations, we only implemented two, but it is easy to imagine more, and we can stop by examining the magnitude of the residuals.

    For the learning rate, we can change it in Google Sheet and see what happens. When the learning rate is small, the “learning process” will be slow. And if the learning rate is 1, we can see that the convergence is achieved at iteration 1.

    Gradient Boosted Linear Regression with learning rate =1— Image by author

    And the residuals of iteration 1 are already zeros.

    Gradient Boosted Linear Regression with learning rate =1— Image by author

    If the learning rate is higher than 1, then the model will diverge.

    Gradient Boosted Linear Regression Divergence— Image by author

    4. Boosting as Gradient Descent in Function Space

    4.1 Comparison with Gradient Descent Algorithm

    At first glance, the role of the learning rate and the number of iterations in Gradient Boosting looks very similar to what we see in Gradient Descent. This naturally leads to confusion.

    • Beginners often notice that both algorithms contain the word “gradient” and follow an iterative procedure. It is therefore tempting to assume that Gradient Descent and Gradient Boosting are closely related, without really knowing why.
    • Experienced practitioners usually react differently. From their perspective, the two methods appear unrelated. Gradient Descent is used to fit weight-based models by optimizing their parameters, while Gradient Boosting is an ensemble method that combines multiple models fitted with the residuals. The use cases, the implementations, and the intuition seem completely different.
    • At a deeper level, however, experts will say that these two algorithms are in fact the same optimization idea. The difference does not lie in the learning rule, but in the space where this rule is applied. Or we can say that the variable of interest is different.

    Gradient Descent performs gradient-based updates in parameter space. Gradient Boosting performs gradient-based updates in function space.

    That is the only difference in this mathematical numerical optimization. Let’s see the equations in the case of regression and in the general case below.

    4.2 The Mean Squared Error Case: Same Algorithm, Different Space

    With the Mean Squared Error, Gradient Descent and Gradient Boosting minimize the same objective and are driven by the same quantity: the residual.

    In Gradient Descent, residuals influence the updates of the model parameters.

    In Gradient Boosting, residuals directly update the prediction function.

    In both cases, the learning rate and the number of iterations play the same role. The difference lies only in where the update is applied: parameter space versus function space.

    Once this distinction is clear, it becomes evident that Gradient Boosting with MSE is simply Gradient Descent expressed at the level of functions.

    4.3 Gradient Boosting with any loss function

    The comparison above is not limited to the Mean Squared Error. Both Gradient Descent and Gradient Boosting can be defined with respect to different loss functions.

    In Gradient Descent, the loss is defined in parameter space. This requires the model to be differentiable with respect to its parameters, which naturally restricts the method to weight-based models.

    In Gradient Boosting, the loss is defined in prediction space. Only the loss must be differentiable with respect to the predictions. The base model itself does not need to be differentiable, and of course, it does not need to have its own loss function.

    This explains why Gradient Boosting can combine arbitrary loss functions with non–weight-based models such as decision trees.

    Conclusion

    Gradient Boosting is not just a naive ensemble technique but an optimization algorithm. It follows the same learning logic as Gradient Descent, differing only in the space where the optimization is performed: parameters versus functions. Using linear regression allowed us to isolate this mechanism in its simplest form.

    In the next article, we will see how this framework becomes truly powerful when the base model is a decision tree, leading to Gradient Boosted Decision Tree Regressors.


    All the Excel files are available through this Kofi link. Your support means a lot to me. The price will increase during the month, so early supporters get the best value.

    All Excel/Google sheet files for ML and DL
    Advent Boosted Calendar Day Excel Gradient Learning Linear Machine Regression
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

    March 17, 2026

    Generalizing Real-World Robot Manipulation via Generative Visual Transfer

    March 17, 2026

    CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

    March 17, 2026

    Follow the AI Footpaths | Towards Data Science

    March 17, 2026

    Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

    March 17, 2026

    Hallucinations in LLMs Are Not a Bug in the Data

    March 16, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    How Google Profits From Demand You Already Own

    March 17, 2026

    Boost your skills with Growth Memo’s weekly expert insights. Subscribe for free! Branded search inflates…

    Extra-Creamy Deviled Eggs Recipe | Epicurious

    March 17, 2026

    How to Sell AI Services Without Selling Your Soul : Social Media Examiner

    March 17, 2026

    Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

    March 17, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

    March 17, 2026

    Trust Is The New Ranking Factor

    March 17, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.