Linear Regression Is Actually a Projection Problem, Part 1: The Geometric Intuition

learns machine learning usually starts with linear regression, not just because it’s simple, but because it introduces us to the key concepts that we use in neural networks and deep learning.

We already know that linear regression is used to predict continuous values.

Now we have this data, we need to build a simple linear regression model to predict price of the house using its size.

We generally use python to implement this algorithm.

Code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# 1. Data
X = np.array([1, 2, 3]).reshape(-1, 1)
y = np.array([11, 12, 19])

# 2. Train the scikit-learn model
model = LinearRegression()
model.fit(X, y)

# 3. Extract the parameters and predictions
intercept = model.intercept_
slope = model.coef_[0]
y_pred = model.predict(X)
errors = y - y_pred

print("--- Scikit-Learn Results ---")
print(f"Intercept (Beta 0): {intercept:.0f}")
print(f"Slope (Beta 1):     {slope:.0f}")
print(f"Predictions:        {y_pred}")
print(f"Errors (Residuals): {errors}")

# 4. Create the 2D Scatterplot
plt.figure(figsize=(8, 6))

# Plot the actual data points
plt.scatter(X, y, color='blue', s=100, label='Actual Data (y)')

# Plot the scikit-learn line of best fit
plt.plot(X, y_pred, color='red', linewidth=2, label='scikit-learn Best Fit Line')

# Draw the vertical residual lines (errors)
for i in range(len(X)):
    plt.plot([X[i][0], X[i][0]], [y[i], y_pred[i]], color='green', linestyle='--', linewidth=2)
    plt.text(X[i][0] + 0.05, (y[i] + y_pred[i])/2, f'e={errors[i]:.0f}', color='green', fontsize=12)

plt.xlabel('Size (x, in 1000 sq ft)')
plt.ylabel('Price (y, in $100k)')
plt.title('scikit-learn Simple Linear Regression')
plt.legend()
plt.grid(True, linestyle=':', alpha=0.7)

# Display the plot
plt.show()

Then we get these values:

Get ready!

This time we are going to take a different route to solve this problem.

Since we are taking a less explored route, let’s prepare ourselves so we don’t get lost midway through our journey of understanding.

The route we are going to take is Vector Projection and for that let’s recollect our basics on vectors.

In this first part, we are going to build our geometric intuition around vectors, dot products, and projections. These are the absolute fundamentals we need to learn to understand linear regression clearly.

Once we have that foundation down, Part 2 will dive into the exact implementation.

Let’s go.

What is a Vector?

Let’s go back to high school where we first got introduced to vectors.

One of the first examples we learn about vectors is Speed vs. Velocity.

This example tells us that 50 km/hour is the speed, and it is a scalar quantity because it only has magnitude, whereas 50 km/hour in the east direction is a vector quantity because it has both magnitude and direction.

Now, let’s draw it on a graph.

If we plot the coordinates (2, 4), we consider it a point in 2D space.

But if we connect the origin to that point with an arrow, we now consider it a vector because it has both magnitude and direction.

We can think of (2, 4) as a set of instructions: it says to take 2 steps to the right along the x-axis, and then 4 steps up parallel to the y-axis

The way it points gives us the direction.

The length of an arrow gives us the magnitude of the vector.

\[
\text{From the plot we can observe the formation of a right-angled triangle.}
\\
\text{From the Pythagorean theorem we know that}
\\
c = \sqrt{a^2 + b^2}
\\
\text{For a vector } v = (x,y), \text{ the magnitude is}
\\
||v|| = \sqrt{x^2 + y^2}
\\
\text{Substituting the values of the vector } (2,4)
\\
||v|| = \sqrt{2^2 + 4^2}
\\
||v|| = \sqrt{4 + 16}
\\
||v|| = \sqrt{20}
\\
||v|| \approx 4.47 \text{ units}
\]

Now let’s draw another vector (6,2) in our graph.

By looking at the vectors, we can see that they are generally pointing up and to the right.

They aren’t pointing in the exact same direction, but they are clearly leaning the same way.

The angle between the vectors is small.

Instead of just observing and stating this, we can measure how two vectors actually agree with each other. For that, we use the dot product.

From the plot, we have two vectors:

\[
\mathbf{A} = (2,4)
\]

\[
\mathbf{B} = (6,2)
\]

We already know that we can interpret these numbers as movements along the axes.

Vector $A$:

\[
A = (2,4)
\]

means

\[
2 \text{ units in the } x\text{-direction}
\]

\[
4 \text{ units in the } y\text{-direction}
\]

Vector $B$:

\[
B = (6,2)
\]

means

\[
6 \text{ units in the } x\text{-direction}
\]

\[
2 \text{ units in the } y\text{-direction}
\]

To measure how much the two vectors agree with each other along each axis, we multiply their corresponding components.

Along the $x$-axis:

\[
2 \times 6
\]

Along the $y$-axis:

\[
4 \times 2
\]

Then we add these contributions together:

\[
2 \times 6 + 4 \times 2
\]

\[
= 12 + 8
\]

\[
= 20
\]

This operation is called the Dot Product.

In general, for two vectors

\[
\mathbf{A} = (a_1,a_2)
\]

\[
\mathbf{B} = (b_1,b_2)
\]

the dot product is defined as

\[
\mathbf{A} \cdot \mathbf{B} = a_1 b_1 + a_2 b_2
\]

We got a dot product of 20, but what does that mean?

Since 20 is a positive number, we can observe that the angle between the vectors is less than 90^o.

We can also think of it as a positive relationship between the two variables represented by these vectors. This idea will become clearer when we begin discussing the Simple Linear Regression problem.

You may have a doubt about how the dot product is related to the angle between two vectors and how we can say that the angle is less than 90^o.

Before understanding this relationship, we will look at two more cases of the dot product so that our understanding becomes clearer about what the dot product is actually measuring. After that, we will move on to the angle between vectors.

Now let’s look at two more examples to better understand the dot product.

When we look at vectors with a dot product equal to 0, we can say that they are orthogonal, meaning they are perpendicular to each other. In this case, the vectors have no linear relationship, which corresponds to zero correlation.

Now, when we observe vectors with a negative dot product, we can see that the angle is obtuse, which means the vectors are pointing in opposite directions, and this represents a negative correlation.

Now, once again, let’s consider the two vectors [2,4] and [6,2].

We got the dot product of 20.

Now, there is another way to find the dot product, which involves the lengths of the vectors and the angle between them.

This shows that 20 is not a random number; it indicates that the vectors are leaning in the same direction.

Let the two vectors be

\[
A = (2,4), \qquad B = (6,2)
\]

First compute the lengths (magnitudes) of the vectors.

\[
\|A\| = \sqrt{2^2 + 4^2}
\]

\[
\|A\| = \sqrt{4 + 16} = \sqrt{20}
\]

\[
\|B\| = \sqrt{6^2 + 2^2}
\]

\[
\|B\| = \sqrt{36 + 4} = \sqrt{40}
\]

Now using the dot product formula

\[
A \cdot B = \|A\| \|B\| \cos(\theta)
\]

From the component formula of the dot product

\[
A \cdot B = 2 \times 6 + 4 \times 2
\]

\[
A \cdot B = 12 + 8 = 20
\]

Substitute into the angle formula

\[
20 = \sqrt{20} \times \sqrt{40} \times \cos(\theta)
\]

\[
\cos(\theta) = \frac{20}{\sqrt{20}\sqrt{40}}
\]

Now simplify the denominator

\[
\sqrt{20} = 2\sqrt{5}, \qquad \sqrt{40} = 2\sqrt{10}
\]

\[
\sqrt{20}\sqrt{40} = (2\sqrt{5})(2\sqrt{10})
\]

\[
= 4\sqrt{50}
\]

\[
= 4(5\sqrt{2})
\]

\[
= 20\sqrt{2}
\]

So,

\[
\cos(\theta) = \frac{20}{20\sqrt{2}}
\]

\[
\cos(\theta) = \frac{1}{\sqrt{2}}
\]

\[
\theta = 45^\circ
\]

We get this formula by using the Law of Cosines. This is the geometric way of solving the dot product.

From this equation, we can understand that if we have the lengths of the vectors and the angle between them, we can easily find the dot product of the two vectors.

Up to this point, we’ve gotten a basic idea of what vectors are, their dot products, and the angles between them.

I know everything has been mostly mathematical up to this point. However, we are now going to use what we’ve learned to discuss projections, and things will become even clearer when we finally solve a simple linear regression problem.

Vector Projections

Now imagine we are driving along a highway through a forest, and on our way to reach the house somewhere deep inside the forest, far away from the highway.

Let’s say our house is at a fixed point, (2,4). There is a mud road through the forest that leads directly there, but due to incessant rains, we cannot take that route.

Now we have another option which is highway through the forest, which runs in the direction of (6,2).

Now what we have to do is travel along the highway, park the car alongside the highway, and then take the luggage and walk to your home.

We are carrying heavy luggage, and we don’t want to walk much. So, we need to stop your car at the point on the highway where the walking distance to our home is the shortest.

The question now is: How far do we need to travel along that highway (the [6, 2] direction) to get as close as possible to our home at (2, 4)?

Now, by looking at the visual above, let’s see what we can observe.

If we stop our car at point A, it is too early; we can see that the red line connecting to our home is a long walk.

Next, if we stop the car at point C, we have already gone past our home, so we need to turn back, and this is also a long walk.

We can observe that point B is the best spot to stop the car because our walk home forms a perfect 90^o angle with the highway.

We need to find the exact point on the highway to park our car.

Let’s start from the origin and first find the direct distance to our home, which is located at (2, 4).

In linear algebra, this distance is simply the length of the vector. Here, the length is $\sqrt{20}$ , which is 4.47. We can say that our home is 4.47 kilometers from the origin in the direction of (2, 4).

But we cannot take that direct route because of the rain; it is a muddy, unpaved road. We only have one option: drive along the highway in the direction of (6, 2).

We have a highway pointing in the direction of (6, 2), which we call a vector.

On this highway, we can only move forward or backward and this is the only dimension we have.

Every point we can possibly reach makes up the span of the vector.

It is important to understand that the highway is an infinite road. It doesn’t actually start at (0, 0); we are just starting our journey from that specific point in the middle of it.

To minimize our walk through the mud, we need to find the spot on the highway closest to our home, which is always a perpendicular path.

To know our driving distance along the highway, let’s consider a milestone signpost on our highway at (6, 2) which we use as a reference for direction on the highway.

If we calculate the physical distance from our starting point (0, 0) to this signpost, the length is $\sqrt{40}$ , which is 6.32. So, our reference signpost is exactly 6.32 km down the road.

There are several ways to find our exact parking point. First, if we look at any two known points on the highway like our start at (0, 0) and our signpost at (6, 2), we can calculate the slope of the road:

$$
m = \frac{y_2 – y_1}{x_2 – x_1}
$$

$$
m = \frac{2 – 0}{6 – 0}
$$

$$
m = \frac{2}{6} = \frac{1}{3}
$$

A slope of 1/3 means that for every 3 units of increase in x, there is 1 unit increase in y. Because every point on the highway vector follows this exact rule, we can write the equation for our road as:

$$
y = \frac{1}{3}x
$$

This means every point on the highway including the parking point have the coordinates $(x, \frac{1}{3}x)$ .

We just need to find the perfect x that minimizes the walk between our car at ( $x, \frac{1}{3}x$ ) and our home at (2, 4).

To avoid dealing with square roots in our calculus and to make the calculation easier, let’s minimize the squared distance.

We want to minimize the squared distance, f(x), between our car at ( $x, \frac{1}{3}x$ ) and our home at (2, 4). The squared distance formula is:

The squared distance formula:

$$
f(x) = (x_2 – x_1)^2 + (y_2 – y_1)^2
$$

$$
f(x) = (x – 2)^2 + \left(\frac{1}{3}x – 4\right)^2
$$

Expanding the binomials:

$$
f(x) = (x^2 – 4x + 4) + \left(\frac{1}{9}x^2 – \frac{8}{3}x + 16\right)
$$

Grouping the terms together:

$$
f(x) = \left(1 + \frac{1}{9}\right)x^2 – \left(4 + \frac{8}{3}\right)x + (4 + 16)
$$

$$
f(x) = \frac{10}{9}x^2 – \left(\frac{12}{3} + \frac{8}{3}\right)x + 20
$$

$$
f(x) = \frac{10}{9}x^2 – \frac{20}{3}x + 20
$$

If we graph the error function f(x), it forms a U-shaped parabola.
To find the minimum point, we take the derivative and set it to zero.

$$
f'(x) = \frac{d}{dx}\left(\frac{10}{9}x^2 – \frac{20}{3}x + 20\right)
$$

$$
f'(x) = \frac{20}{9}x – \frac{20}{3}
$$

Setting the derivative equal to zero:

$$
\frac{20}{9}x – \frac{20}{3} = 0
$$

$$
\frac{20}{9}x = \frac{20}{3}
$$

$$
x = \frac{20}{3} \times \frac{9}{20}
$$

$$
x = 3
$$

Plug x=3 back into the highway equation:

$$
y = \frac{1}{3}x
$$

$$
y = \frac{1}{3}(3) = 1
$$

The perfect parking spot is (3,1).

By using the calculus method, we found the parking spot at (3, 1). If we compare this to our milestone signpost at (6, 2), we can observe that the parking point is exactly half the distance to the signpost.

This means that if we drive halfway to the signpost, we reach the exact point where we can park and take the shortest path to home.

\[
\begin{gathered}
\text{Our Parking Spot: } \mathbf{P} = (3, 1) \\
\text{Our Signpost: } \mathbf{V} = (6, 2) \\
\\
\text{Relationship:} \\
(3, 1) = 0.5 \times (6, 2) \\
\\
\text{Therefore, our optimal multiplier } c \text{ is } 0.5.
\end{gathered}
\]

This 0.5 is exactly what we find in linear regression. We will get an even clearer idea of this when we apply these concepts to solve a real-world regression problem.

From the plot, we can say that the vector from the origin (0,0) to the parking point (3,1) is the projection of the home vector onto the highway, whose length is

$$
\text{Driving Distance} = \sqrt{x^2 + y^2}
$$

$$
= \sqrt{3^2 + 1^2}
$$

$$
= \sqrt{9 + 1}
$$

$$
= \sqrt{10}
$$

$$
\approx 3.16 \text{ km}
$$

This is how we calculate the Vector Projections.

Now, we also have a shortcut to find the parking point.

Earlier, we calculated the dot product of these two vectors, which is 20.

Now, let’s multiply the length of the projection vector by the length of the highway vector (from the origin to the signpost).

3.16 times 6.32, which also equals 20. From this, we can understand that the dot product gives us the length of the projection multiplied by the length of the highway.

We have a dot product of 20, and the squared length of the highway is 40. We are using the squared length because the dot product itself has squared units; when we multiply a1b1 and a2b2 and add them, the units also get multiplied.

Now, if we divide the dot product by the squared length (20 / 40), we get 0.5. We call this the scaling factor.

Because we want to find the exact point along the highway, we scale the highway vector by 0.5, which gives us (3, 1).

In vector vocabulary, we call the highway the base vector and the home vector the target vector.

And that is how we get our parking point at (3, 1).

What we discussed so far can be expressed using a simple mathematical formula called the projection formula.

$$
\text{proj}_{\mathbf{b}}(\mathbf{a}) =
\frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{b}\|^2}\mathbf{b}
$$

Let

$$
A = (2,4), \quad B = (6,2)
$$

First compute the dot product.

$$
A\cdot B = 2\times6 + 4\times2
$$

$$
A\cdot B = 12 + 8
$$

$$
A\cdot B = 20
$$

Now compute the squared length of the highway vector.

$$
\|B\|^2 = 6^2 + 2^2
$$

$$
\|B\|^2 = 36 + 4
$$

$$
\|B\|^2 = 40
$$

Now divide the dot product by the squared length.

$$
\frac{A\cdot B}{\|B\|^2} = \frac{20}{40}
$$

$$
= 0.5
$$

This value is the scaling factor.

Now scale the highway vector.

$$
\text{proj}_{B}(A) = 0.5(6,2)
$$

$$
= (3,1)
$$

So the projection point (the parking point) is

$$
(3,1)
$$

Now, what can we say about this 3.16 km distance along the highway?

Let’s say, for example, we take the direct mud route and ignore the highway. As we move along our path to home, we are actually traveling in two directions simultaneously: parallel to the highway and sideways toward our home.

By the time we finally reach our home, we have effectively traveled 3.16 km in the direction of the highway.

On the other hand, what if we travel along the highway? If we drive exactly 3.16 km along the highway, then we reach our parking point at (3, 1).

This specific point is where the path to our home is perfectly perpendicular to the highway.

Most importantly, this means it represents the absolute shortest walking path from the highway to our home!

I hope you are walking away with intuitive understanding of vectors, dot products, and projections!

In Part 2, we will take exactly what we learned today and use it to solve a real linear regression problem.

If anything in this post felt unclear, feel free to comment.

Meanwhile, I recently wrote a deep dive on the Chi-Square test. If you are interested, you can read it here.

Thanks so much for reading!

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Linear Regression Is Actually a Projection Problem, Part 1: The Geometric Intuition

DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents

Efficient High-Resolution Visual Understanding for Vision-Language Models

Large Language Model Enhanced Greybox Fuzzing

Why You Should Stop Worrying About AI Taking Data Science Jobs

[2603.14845] Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Google Expands UCP With Cart, Catalog, Onboarding

Make.com pricing: Is it worth it? [2026]

Easy Fish Curry With Coconut Milk Recipe

DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

The Content Moat Is Dead. The Context Moat Is What Survives

Best Content Format on Social Platforms in 2026: 45M+ Posts Analyzed

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Linear Regression Is Actually a Projection Problem, Part 1: The Geometric Intuition

What is a Vector?

Vector Projections

Related Posts

Subscribe to Updates