Part 1: K-Nearest Neighbors (KNN) Classification
Let’s classify a new data point based on its similarity to existing training data using Euclidean distance.
The Setup:
-
Training Data:
-
: (2, 4) Class A -
: (4, 2) Class A -
: (6, 8) Class B -
: (8, 6) Class B
-
-
Test Point (
): (4, 5) -
Hyperparameter:
Step 1: Calculate Euclidean Distances
The formula is
-
Distance to
: -
Distance to
: -
Distance to
: -
Distance to
:
Step 2: Sort and Select the
Sorting the distances in ascending order:
-
(Distance: 2.24) Class A -
(Distance: 3.00) Class A -
(Distance: 3.61) Class B
Step 3: Majority Voting
Among the 3 nearest neighbors, Class A appears twice and Class B appears once.
Result: The test point (4, 5) is classified as Class A.
For regression problems, you apply the exact same steps, except instead of taking majority vote, we simply take the average of K-nearest.
Part 2: Linear Regression
For the following regression examples, we will use a tiny dataset to predict a target
Dataset:
-
: [1, 2, 3] -
: [2, 3, 5]
We are trying to fit the line equation:
Method A: The Normal Equation (Analytical)
The Normal Equation finds the exact optimal weights in one mathematical operation. The formula is:
Step 1: Create the Augmented Matrix
We add a column of 1s to
Step 2: Calculate
Step 3: Calculate the Inverse
The determinant is
Step 4: Calculate
Step 5: Multiply to get
Result: The optimal bias
Method B: Gradient Descent (Iterative)
Instead of matrix inversion, Gradient Descent updates weights iteratively.
-
Initialization:
-
Learning Rate (
): 0.05 -
Gradient Formula for
: -
Gradient Formula for
:
Below is exactly one iteration (epoch) for the three different types of gradient descent using our dataset. Since initial weights are 0,
1. Batch Gradient Descent (BGD)
BGD calculates the error across the entire dataset (
-
Errors (
): (2 - 0) = 2, (3 - 0) = 3, (5 - 0) = 5. -
-
Update:
2. Stochastic Gradient Descent (SGD)
SGD updates the weights after evaluating a single, randomly chosen data point. Let's assume the first point chosen is
-
Error: (2 - 0) = 2.
-
-
Update:
-
-
(The algorithm would then immediately perform another update using the next single data point).
3. Mini-Batch Gradient Descent (MBGD)
MBGD uses a small subset of the data. Let's use a batch size of 2, picking the first two points:
-
Errors: 2, 3.
-
-
Update: