Finally Understand Linear Regression (Full Code Included)
Finally understand Linear Regression from the ground up. This guide breaks down the core concepts and includes full Python code, both with scikit-learn and from scratch.
Alex Carter
Data Scientist and educator passionate about making complex machine learning concepts simple.
Ever stared at a chart of data points and thought, "There’s a trend here, but how do I prove it?" Or maybe you've heard terms like "machine learning" and "model training" and felt like you were on the outside of a very complex, very exclusive club. What if I told you that one of the most fundamental concepts in machine learning is something you probably learned the basics of in high school algebra?
Welcome to Linear Regression. It’s the "Hello, World!" of predictive modeling, but don't let its simplicity fool you. It's an incredibly powerful tool for everything from predicting house prices based on square footage to forecasting sales based on ad spend. It’s the bedrock on which many more complex algorithms are built. For years, it felt like a black box to me, too. I could use a library to get an answer, but I didn't truly understand what was happening under the hood.
This post is designed to change that. We're going to pull back the curtain and demystify Linear Regression once and for all. We'll walk through the core concepts intuitively, touch on the (gentle) math that makes it work, and then—the best part—we'll build it. Twice. First, using the popular Scikit-learn library, and then, we'll build it again from scratch in Python. By the end, you'll have that "Aha!" moment and be ready to apply this powerful tool with confidence.
What is Linear Regression, Really?
At its heart, linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. Let's break that down:
- Independent Variable (Feature, or 'X'): This is the variable we believe influences the other. Think: hours spent studying, square footage of a house, or advertising budget.
- Dependent Variable (Target, or 'y'): This is the variable we are trying to predict. Think: exam score, house price, or total sales.
Linear regression assumes there's a linear relationship. In simple terms, it tries to draw a straight line that best explains how 'X' predicts 'y'. If you have one independent variable, it's called Simple Linear Regression. If you have more than one (e.g., predicting a house price using square footage AND number of bedrooms), it's called Multiple Linear Regression. Today, we'll master the simple version to build a solid foundation.
The Core Idea: Finding the "Best Fit" Line
Imagine you have a scatter plot of data. On the x-axis, you have the size of several houses, and on the y-axis, you have their selling prices. You'd probably see a cloud of points trending upwards—larger houses tend to cost more.
The goal of linear regression is to draw a single straight line through these points that best captures this trend. This line is our "model." Why is this useful? Because once we have this line, we can use it to make predictions. If someone tells you the size of a new house not in your original data, you can find that size on the x-axis, trace up to your line, and see the predicted price on the y-axis. Simple, right? The key question, of course, is: how do we find the best line?
The Math Behind the Magic (It's Gentle!)
This is where things get interesting. To find the "best fit" line, we need three key ingredients: a way to represent a line, a way to measure how "good" our line is, and a way to improve it.
The Equation: y = mx + b
Remember this from school? This is the equation for a straight line. In machine learning, we just use slightly different names for the parts:
y
is our predicted value (the target).x
is our input value (the feature).m
is the slope or weight. It tells us how muchy
changes for a one-unit increase inx
.b
is the y-intercept or bias. It's the value ofy
whenx
is zero.
The entire process of "training" a linear regression model is just about finding the optimal values for m
and b
that create the best-fitting line for our data.
Cost Function: How Do We Measure "Wrong"?
So, how do we know if one line is better than another? We measure its error. For each data point, we can see the difference between the actual value (the dot) and the value our line predicts (the point on the line). This difference is called the residual or error.
We need a single number that tells us the total error for the entire dataset. A common way to do this is with the Mean Squared Error (MSE) cost function. Here's how it works:
- For each data point, calculate the error (actual y - predicted y).
- Square each error. (This makes all errors positive and penalizes larger errors more heavily).
- Calculate the average of all these squared errors.
Our goal is to find the m
and b
that make this MSE value as small as possible.
Gradient Descent: The Path to the Best Fit
This is the algorithm that minimizes our cost function. Imagine the MSE for every possible combination of m
and b
creates a 3D bowl shape. Our goal is to find the very bottom of that bowl. Gradient Descent is how we get there.
Think of it like being on a foggy mountain and wanting to get to the lowest valley. You'd feel the slope of the ground beneath your feet (the gradient) and take a step in the steepest downward direction. You'd repeat this process, taking small steps, until you couldn't go any lower.
That's exactly what Gradient Descent does:
- Start with random values for
m
andb
. - Calculate the gradient (the "slope" of the cost function) with respect to
m
andb
. This tells us which direction will decrease the error most. - Update
m
andb
by taking a small step in that downward direction. The size of this step is controlled by a parameter called the learning rate. - Repeat steps 2 and 3 for a set number of iterations or until the error stops changing much.
By the end of this process, we'll have the m
and b
that correspond to the bottom of the bowl—the line with the minimum possible error.
Let's Build It! Linear Regression in Python
Theory is great, but code makes it real. Let's get our hands dirty.
Setting Up Our Environment
You'll need a few common data science libraries. If you don't have them, install them via pip:
pip install numpy scikit-learn matplotlib
Using Scikit-Learn (The Easy Way)
Scikit-learn is the go-to library for machine learning in Python. It makes implementing models like linear regression incredibly simple.
import numpy as np
from sklearn.linear_model import LinearRegression
# Let's create some sample data
X = np.array([[1], [2], [3], [4], [5], [6]]) # Feature (e.g., years of experience)
y = np.array([2.5, 3.5, 4.0, 5.2, 6.1, 6.8]) # Target (e.g., salary in 10k)
# 1. Create the model instance
model = LinearRegression()
# 2. Train (fit) the model to our data
model.fit(X, y)
# 3. Check the results
m = model.coef_[0]
b = model.intercept_
print(f"Scikit-learn found: m = {m:.4f}, b = {b:.4f}")
# 4. Make a prediction
new_experience = np.array([[7]])
predicted_salary = model.predict(new_experience)
print(f"Prediction for {new_experience[0][0]} years: {predicted_salary[0]:.4f}")
In just a few lines, we have a working model! Scikit-learn handles all the Gradient Descent complexity for us. But what's actually happening? Let's build it ourselves to find out.
Building from Scratch (The "I Get It Now!" Way)
This is where the magic becomes logic. We'll implement the Gradient Descent process we just discussed.
import numpy as np
class MyLinearRegression:
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None # This will be 'm'
self.bias = None # This will be 'b'
def fit(self, X, y):
n_samples, n_features = X.shape
# 1. Initialize parameters
self.weights = 0 # m
self.bias = 0 # b
# 2. Gradient Descent
for _ in range(self.n_iterations):
# Predict y using current m and b
y_predicted = self.weights * X.flatten() + self.bias
# Calculate gradients (derivatives of MSE cost function)
dw = (1 / n_samples) * np.sum(X.flatten() * (y_predicted - y))
db = (1 / n_samples) * np.sum(y_predicted - y)
# 3. Update parameters
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
def predict(self, X):
return self.weights * X.flatten() + self.bias
# Let's use the same data as before
X = np.array([[1], [2], [3], [4], [5], [6]])
y = np.array([2.5, 3.5, 4.0, 5.2, 6.1, 6.8])
# Create and train our custom model
my_model = MyLinearRegression(learning_rate=0.05, n_iterations=2000)
my_model.fit(X, y)
m_custom = my_model.weights
b_custom = my_model.bias
print(f"Our model found: m = {m_custom:.4f}, b = {b_custom:.4f}")
Look at that! The values for m
and b
should be extremely close to what Scikit-learn found. We've successfully implemented the learning process from scratch.
Putting It All Together: A Visual Payoff
Let's visualize our results to see how well both models performed. A picture is worth a thousand lines of code!
import matplotlib.pyplot as plt
# Predictions from both models
sklearn_predictions = model.predict(X)
custom_predictions = my_model.predict(X)
plt.figure(figsize=(10, 6))
# Plot original data points
plt.scatter(X, y, color='blue', label='Actual Data')
# Plot Scikit-learn's line
plt.plot(X, sklearn_predictions, color='red', linewidth=3, label='Scikit-learn Fit')
# Plot our custom model's line
plt.plot(X, custom_predictions, color='green', linestyle='--', linewidth=2, label='Our Custom Fit')
plt.title('Linear Regression: Scikit-learn vs. From Scratch')
plt.xlabel('Years of Experience')
plt.ylabel('Salary (in 10k)')
plt.legend()
plt.grid(True)
plt.show()
When you run this code, you'll see your original data points and two lines (one red, one green) running right through them, likely overlapping almost perfectly. This is the proof: you've successfully built a machine learning model that performs just like the industry-standard library.
Conclusion: Your First Step into Machine Learning
Congratulations! You've gone from the high-level concept of Linear Regression right down to the nitty-gritty of its implementation. You now know that it's not magic, but math—and intuitive math at that. You've seen how to find the "best fit" line by defining an error (cost function) and systematically minimizing it (Gradient Descent).
Most importantly, you've built it yourself. This hands-on understanding is the key to unlocking more complex topics in data science and machine learning. From here, you can explore multiple linear regression, polynomial regression, or logistic regression for classification problems. The journey has just begun, but you've taken the most important step: the first one.