Overfitting and Underfitting in Machine Learning

Intermediate 9 min read February 08, 2026

Learn about overfitting and underfitting in machine learning

overfitting training model-optimization

Photo by Generated by NVIDIA FLUX.1-schnell

📚 Part 4 of 4

Machine Learning Training Essentials

←

Previous Understanding model parameters and hyperparameters

The Goldilocks Dilemma: Overfitting and Underfitting Explained 🚨

Have you ever met someone who could recite every fact from a textbook but froze when faced with a problem they’d never seen before? That’s overfitting in human form. Conversely, have you met someone who oversimplifies complex situations to the point of uselessness? That’s underfitting. Today, we’re diving into the single most important challenge in machine learning: finding that “just right” sweet spot where your model actually learns the pattern rather than the noise.

Prerequisites

While this guide stands perfectly well on its own, you’ll get extra value if you’ve read Part 3: Understanding Model Parameters and Hyperparameters. We’ll naturally connect back to those concepts—especially how hyperparameters like regularization strength act as the “volume knobs” for model complexity. Otherwise, just bring your curiosity and a basic understanding of how models learn from data.

When Your Model Becomes a Memorization Machine 🧠

Imagine training a student for a math exam by showing them exactly 50 practice problems. An overfit student would memorize every single answer—including the coffee stains and handwriting quirks on the page—without understanding the underlying algebra. When exam day brings problem #51, they fail spectacularly.

That’s overfitting in a nutshell: your model learns the training data too well, capturing not just the genuine patterns but also the random noise and outliers. It has high variance—meaning it’s overly sensitive to the specific training set and performs poorly on new data.

🎯 Key Insight: Overfitting often happens when you have too many parameters relative to your amount of training data. Remember those model parameters we discussed in Part 3? If you have millions of parameters but only thousands of examples, your model can essentially “store” the training data instead of generalizing from it.

The telltale signs? A massive gap between your training accuracy (95%!) and validation accuracy (65%). Your model is essentially “cheating” by memorizing answers rather than learning the rules.

The Opposite Trap: When Your Model Oversimplifies 📉

Now picture a different student who glances at the textbook summary and declares, “Math is just adding numbers.” They breeze through the exam with simple addition, ignoring that half the questions involve calculus. They’ll be consistently wrong in a consistent way.

This is underfitting—when your model is too simple to capture the underlying structure of the data. It has high bias, making strong assumptions that prevent it from learning the true relationship. Think of a straight line trying to fit a parabola, or a decision tree with only one split trying to classify complex images.

Underfitting often stems from:

Too few parameters (the model lacks capacity to learn)
Overly aggressive regularization (those hyperparameters from Part 3 can backfire if set too high!)
Missing important features (trying to predict house prices without knowing the square footage)

⚠️ Watch Out: Beginners often assume underfitting means “my model is stupid,” but that’s not fair! An underfit model is usually too rigid rather than unintelligent. It’s like trying to sculpt a masterpiece with concrete shoes on—limited flexibility, not limited intelligence.

The Bias-Variance Tango 🕺💃

Here’s where it gets philosophical (and mathematical). Every prediction error in your model breaks down into three components:

Total Error = Bias² + Variance + Irreducible Error

Bias is your model’s tendency to consistently miss in a particular direction (underfitting)
Variance is your model’s sensitivity to small fluctuations in the training set (overfitting)
Irreducible error is the noise inherent in your data that no model can capture

The brutal truth? Bias and variance are in a tug-of-war. As you reduce one, the other tends to increase. Finding the sweet spot is the art of machine learning.

💡 Pro Tip: I like to visualize this as adjusting the zoom on a camera. Underfitting is when everything’s blurry—you can’t see the details. Overfitting is when you’re zoomed in so close that you see individual pixels and skin texture but miss that you’re looking at a face. You want that crisp, clear middle ground.

Becoming a Model Doctor: Diagnosis Techniques 🔍

So how do we tell if our model is overfitting, underfitting, or just right? We become diagnosticians using learning curves.

Plot your training error and validation error as functions of training set size or training iterations:

Overfitting: Training error stays low while validation error plateaus high (big gap)
Underfitting: Both errors are high and close together (your model isn’t learning enough)
Just Right: Both errors converge to a low point with minimal gap

Another powerful tool is the validation curve, where you plot model performance against different hyperparameter values (like regularization strength or polynomial degree). This directly connects to our previous discussion about hyperparameters—you’re literally tuning those knobs to find where validation performance peaks.

🎯 Key Insight: If your validation curve shows training score much higher than validation score across all hyperparameter settings, you need more data or stronger regularization. If both scores are low and close, you need a more complex model or less regularization.

The Cure: From Regularization to Data Augmentation 💊

Once diagnosed, how do we fix these ailments?

For Overfitting:

Regularization (L1/L2): Penalize large parameter values, effectively simplifying the model without changing the architecture. This is where those hyperparameters shine!
Dropout (for neural networks): Randomly ignore neurons during training to prevent co-adaptation
More data: The ultimate cure—more examples make memorization harder
Data augmentation: Create synthetic training examples (rotating images, adding noise to audio)

For Underfitting:

Increase model complexity: Add more layers, neurons, or features
Reduce regularization: Turn down that regularization strength hyperparameter
Feature engineering: Create better input features that capture the underlying pattern
Train longer: Sometimes your model just needs more epochs to converge

💡 Pro Tip: I always tell my students: “When in doubt, add data.” It’s shocking how often collecting 10x more training data solves problems that hours of hyperparameter tuning couldn’t touch. Data is the ultimate regularizer!

Real-World Examples That Actually Matter 🌍

Let me share why this isn’t just academic theory.

The Military Tank Detector (Classic Overfitting): In the 1980s, researchers trained a neural network to detect tanks hiding in forests. It performed perfectly on training data but failed in the field. They eventually discovered the model had learned to recognize weather conditions—all training photos of tanks were taken on cloudy days, and non-tank photos on sunny days. The model “cheated” by detecting sky brightness rather than camouflage patterns.

Netflix’s “Everyone Gets The Same Recommendations” Phase (Underfitting): Early recommendation systems were so simple they suggested the same popular movies to everyone. They had high bias—assuming all users were identical. Netflix had to increase model complexity dramatically (moving to matrix factorization and deep learning) to capture individual taste nuances.

Facial Recognition Bias: Many commercial facial recognition systems overfit to training data containing predominantly light-skinned faces, then failed spectacularly on darker skin tones. This isn’t just a technical failure—it’s a consequence of unrepresentative training data combined with models too flexible for the data they actually had.

These examples matter because they show that overfitting and underfitting aren’t just statistical inconveniences. They have real consequences for fairness, safety, and business success.

Try It Yourself 🛠️

Ready to see this in action? Here’s a concrete experiment using Python and scikit-learn:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import validation_curve

# Generate synthetic data (sine wave + noise)
np.random.seed(42)
X = np.sort(np.random.rand(100, 1) * 10, axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, 100)

# Try polynomial degrees 1, 4, and 15
degrees = [1, 4, 15]
plt.figure(figsize=(15, 4))

for i, degree in enumerate(degrees):
    ax = plt.subplot(1, 3, i+1)
    
    # Fit model
    model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
    model.fit(X, y)
    
    # Predict
    X_test = np.linspace(0, 10, 1000).reshape(-1, 1)
    y_pred = model.predict(X_test)
    
    # Plot
    ax.scatter(X, y, color='blue', alpha=0.5, label='Training data')
    ax.plot(X_test, y_pred, color='red', label=f'Degree {degree}')
    ax.set_ylim(-2, 2)
    ax.legend()

plt.show()

What you’ll see:

Degree 1 (Underfitting): A straight line missing the sine wave pattern entirely
Degree 4 (Just Right): Smooth curve following the underlying pattern
Degree 15 (Overfitting): Wiggly line hitting every noise point, failing to generalize

Next step: Use validation_curve to plot training and validation scores across different polynomial degrees. Watch that gap appear as you increase complexity!

Key Takeaways 🎯

Overfitting is memorization—high variance, low training error but high validation error. Your model is “too flexible” for the data available.
Underfitting is oversimplification—high bias, high error on both training and validation. Your model makes assumptions that prevent learning.
The bias-variance tradeoff is unavoidable; your goal is finding the sweet spot where total error is minimized.
Learning curves are your diagnostic tool—use them to identify which problem you have before trying to fix it.
Hyperparameters (especially regularization) are your primary controls for complexity, but more data is often the best medicine for overfitting.
Always test on truly held-out validation data that simulates real-world deployment conditions.