1. Gradient Descent Recap

The lecture briefly re-introduces Gradient Descent as a method to find the local optima of a differentiable function. The algorithm takes steps in the direction of the negative gradient (the direction of the greatest decrease) to reduce the function's value. The update rule is defined as:

xt+1=xtγtf(xt)

Here, γt represents the step size, which is also commonly known as the learning rate.


2. The Naïve Bayes Classifier

Naïve Bayes is a classification algorithm rooted in probability, specifically Bayes' Theorem.

The core formula is:

P(A|B)=P(B|A)P(A)P(B)

more on it here

Naïve Bayes on Categorical Data

When dealing with categorical features (like weather conditions), the algorithm calculates probabilities based on the frequency of occurrences in the training data.

Naïve Bayes on Numerical Data

When your features are continuous numbers (like Age or test scores) rather than categories, you cannot simply count frequencies. Instead, Naïve Bayes assumes the data follows a Normal (Gaussian) distribution.

Gaussian Normal probability distribution curve, AI generated

The probability density function used is:

f(x)=12πσ2e(xμ)22σ2

Numerical Examples on Naive Bayes and More, here


3. Feature Selection vs. Feature Reduction

As datasets grow, using every single feature becomes inefficient. The lecture highlights two ways to handle this:

Why reduce dimensionality?


4. Principal Component Analysis (PCA)

PCA is a powerful feature reduction technique. It condenses data while retaining as much variance as possible through the following steps:

  1. Standardize the continuous initial variables.

  2. Compute the covariance matrix to identify how variables correlate with one another.

  3. Compute the eigenvectors and eigenvalues of the covariance matrix. The eigenvector corresponding to the highest eigenvalue is the "Principal Component".

  4. Create a feature vector to decide which components to keep (usually dropping those with very small eigenvalues).

  5. Recast the data by projecting the original points onto these new principal component axes.