The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

One day, a data scientist told that Ridge Regression was a complicated model. Because he saw that the training formula is more complicated.

Well, this is exactly the objective of my Machine Learning “Advent Calendar”, to clarify this kind of complexity.

So, ile, we will talk about penalized versions of linear regression.

First, we will see why the regularization or penalization is necessary, and we will see how the model is modified
Then we will explore different types of regularization and their effects.
We will also train the model with regularization and test different hyperparameters.
We will also ask a further question about how to weight the weights in the penalization term. (confused ? You will see)

Linear regression and its “conditions”

When we talk about linear regression, people often mention that some conditions should be satisfied.

You may have heard statements like:

the residuals should be Gaussian (it is sometimes confused with the target being Gaussian, which is false)
the explanatory variables should not be collinear

In classical statistics, these conditions are required for inference. In machine learning, the focus is on prediction, so these assumptions are less central, but the underlying issues still exist.

Here, we will see an example of two features being collinear, and let’s make them completely equal.

And we have the relationship: y = x1 + x2, and x1 = x2

I know that if they are completely equal, we can just do: y=2*x1. But the idea is to say they can be very similar, and we can always build a model using them, right?

Then what is the problem?

When features are perfectly collinear, the solution is not unique. Here is an example in the screenshot below.

y = 10000*x1 – 9998*x2

Ridge and Lasso in Excel – all images by author

And we can notice that the norm of the coefficients is huge.

So, the idea is to limit the norm of the coefficients.

And after applying the regularization, the conceptual model is the same!

That is right. The parameters of the linear regression are changed. But the model is the same.

Different Versions of Regularization

So the idea is to combine the MSE and the norm of the coefficients.

Instead of just minimizing the MSE, we try to minimize the sum of the two terms.

Which norm? We can do with norm L1, L2, or even combine them.

There are three classical ways to do this, and the corresponding model names.

Ridge regression (L2 penalty)

Ridge regression adds a penalty on the squared values of the coefficients.

Intuitively:

large coefficients are heavily penalized (because of the square)
coefficients are pushed toward zero
but they never become exactly zero

Effect:

all features remain in the model
coefficients are smoother and more stable
very effective against collinearity

Ridge shrinks, but does not select.

Ridge regression in Excel – All images by author

Lasso regression (L1 penalty)

Lasso uses a different penalty: the absolute value of the coefficients.

This small change has a big consequence.

With Lasso:

some coefficients can become exactly zero
the model automatically ignores some features

This is why LASSO is called so, because it stands for Least Absolute Shrinkage and Selection Operator.

Operator: it refers to the regularization operator added to the loss function
Least: it is derived from a least-squares regression framework
Absolute: it uses the absolute value of the coefficients (L1 norm)
Shrinkage: it shrinks coefficients toward zero
Selection: it can set some coefficients exactly to zero, performing feature selection

Important nuance:

we can say that the model still has the same number of coefficients
but some of them are forced to zero during training

The model form is unchanged, but Lasso effectively removes features by driving coefficients to zero.

3. Elastic Net (L1 + L2)

Elastic Net is a combination of Ridge and Lasso.

It uses:

an L1 penalty (like Lasso)
and an L2 penalty (like Ridge)

Why combine them?

Because:

Lasso can be unstable when features are highly correlated
Ridge handles collinearity well but does not select features

Elastic Net gives a balance between:

stability
shrinkage
sparsity

It is often the most practical choice in real datasets.

What really changes: model, training, tuning

Let us look at this from a Machine Learning point of view.

The model does not really change

For the model, for all the regularized versions, we still write:

y =a x + b.

Same number of coefficients
Same prediction formula
But, the coefficients will be different.

From a certain perspective, Ridge, Lasso, and Elastic Net are not different models.

The training principle is also the same

We still:

define a loss function
minimize it
compute gradients
update coefficients

The only difference is:

the loss function now includes a penalty term

That is it.

The hyperparameters are added (this is the real difference)

For Linear regression, we do not have the control of the “complexity” of the model.

Standard linear regression: no hyperparameter
Ridge: one hyperparameter (lambda)
Lasso: one hyperparameter (lambda)
Elastic Net: two hyperparameters
- one for overall regularization strength
- one to balance L1 vs L2

So:

standard linear regression does not need tuning
penalized regressions do

This is why standard linear regression is often seen as “not really Machine Learning”, while regularized versions clearly are.

Implementation of Regularized gradients

We keep the gradient descent of OLS regression as reference, and for Ridge regression, we only have to add the regularization term for the coefficient.

We will use a simple dataset that I generated (the same one we already used for Linear Regression).

We can see the 3 “models” differ in terms of coefficients. And the goal in this chapter is to implement the gradient for all the models and compare them.

Ridge lasso regression in Excel – All images by author

Ridge with penalized gradient

First, we can do for Ridge, and we only have to change the gradient of a.

Now, it does not mean that the value b is not changed, since the gradient of b is each step depends also on a.

LASSO with penalized gradient

Then we can do the same for LASSO.

And the only difference is also the gradient of a.

For each model, we can also calculate the MSE and the regularized MSE. It is quite satisfying to see how they decrease over the iterations.

Comparison of the coefficients

Now, we can visualize the coefficient a for all the three models. In order to see the differences, we input very large lambdas.

Impact of lambda

For large value of lambda, we will see that the coefficient a becomes small.

And if lambda LASSO becomes extremely large, then we theoretically get the value of 0 for a. Numerically, we have to improve the gradient descent.

Regularized Logistic Regression?

We saw Logistic Regression yesterday, and one question we can ask is if it can also be regularized. If yes, how are they called?

The answer is of course yes, Logistic Regression can be regularized

Exactly the same idea applies.

Logistic regression can also be:

L1 penalized
L2 penalized
Elastic Net penalized

There are no special names like “Ridge Logistic Regression” in common usage.

Why?

Because the concept is no longer new.

In practice, libraries like scikit-learn simply let you specify:

the loss function
the penalty type
the regularization strength

The naming mattered when the idea was new.
Now, regularization is just a standard option.

Conclusion

Ridge and Lasso do not change the linear model itself, they change how the coefficients are learned. By adding a penalty, regularization favors stable and meaningful solutions, especially when features are correlated. Seeing this process step by step in Excel makes it clear that these methods are not more complex, just more controlled.

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Follow the AI Footpaths | Towards Data Science

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Hallucinations in LLMs Are Not a Bug in the Data

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

How nonprofits can build a digital presence that actually drives impact

How Google Profits From Demand You Already Own

Extra-Creamy Deviled Eggs Recipe | Epicurious

How to Sell AI Services Without Selling Your Soul : Social Media Examiner

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

Linear regression and its “conditions”

Different Versions of Regularization

Ridge regression (L2 penalty)

Lasso regression (L1 penalty)

3. Elastic Net (L1 + L2)

What really changes: model, training, tuning

The model does not really change

The training principle is also the same

The hyperparameters are added (this is the real difference)

Implementation of Regularized gradients

Ridge with penalized gradient

LASSO with penalized gradient

Comparison of the coefficients

Regularized Logistic Regression?

Conclusion

Related Posts

Subscribe to Updates