Why Use Bagging

The Problem: The "Know-It-All" Single Model

Imagine you are training a complex model, like a deep Decision Tree, on your entire dataset. Decision trees are incredibly eager learners. If you feed them all your data, they will memorize everything—including the noise, the outliers, and the random quirks specific to that exact dataset.

Because it memorized the training data so perfectly, it performs poorly on new, unseen data. Furthermore, if you were to change even just 5% of your training data, the resulting tree would look completely different. This extreme sensitivity to the training data is called high variance.

The Solution: Bagging's "Wisdom of the Crowd"

Bagging solves this by relying on a committee of models rather than one dictator.

Here is exactly how it improves upon the single-model approach:

1. Creating Diversity through Bootstrapping

Instead of giving the whole dataset to one model, Bagging creates multiple new datasets (bootstrap samples) of the same size as the original. It does this by randomly picking data points from the original set with replacement.

2. Canceling Out the Noise through Aggregation

When it is time to make a prediction, Bagging asks all of its models to vote (or averages their continuous predictions).

Summary of Improvements

By shifting from one model trained on all data to many models trained on bootstrapped data, you gain:


Comparison of Ensemble Learning Techniques

While all three are Ensemble Learning techniques designed to combine multiple "base models" to create one highly accurate "super model," they take fundamentally different approaches to how they train and aggregate those models.

High-Level Comparison Table

Feature Bagging Random Forest Boosting Stacking
Primary Goal Reduce variance (fix overfitting) Reduce variance (even better than Bagging) Reduce bias (improve predictive accuracy) Combine strengths of entirely different algorithms
Training Style Parallel (Independent) Parallel (Independent) Sequential (Iterative) Layered (Parallel base models, then a sequential meta-model)
Base Learners Homogeneous (Usually deep Decision Trees) Homogeneous (Deep Decision Trees) Homogeneous (Usually shallow, "weak" Decision Trees) Heterogeneous (Mix of SVM, KNN, Trees, Neural Networks, etc.)
Data Sampling Bootstrap (Random with replacement) Bootstrap (Random with replacement) Weighted (Focuses on previously misclassified data) Original dataset (usually relies on cross-validation to generate training data for the meta-model)
Final Output Simple Majority Vote / Average Simple Majority Vote / Average Weighted Vote (Better models get more say) Meta-Model Prediction (A final algorithm decides the output)

1. Bagging (Bootstrap Aggregating)

Bagging is all about creating a "wisdom of the crowd" effect to stabilize eager algorithms that tend to overfit.

2. Random Forest

Random Forest is essentially "Bagging Version 2.0." The lecture defines it perfectly: Bagging with trees + random feature subsets.

3. Boosting (e.g., AdaBoost, Gradient Boosting)

Boosting takes a completely different philosophical approach. Instead of building independent models in parallel, it builds them sequentially, with each new model trying to fix the mistakes of the previous one.

4. Stacking (Stacked Generalization)

Stacking completely discards the idea of simple voting or averaging. Instead, it assumes that different types of algorithms (e.g., an SVM vs. a Decision Tree) look at the data in fundamentally different ways. It uses a machine learning model to learn exactly how to combine the outputs of other machine learning models.