Part 1: K-Nearest Neighbors (KNN) Classification

Let’s classify a new data point based on its similarity to existing training data using Euclidean distance.

The Setup:

Step 1: Calculate Euclidean Distances

The formula is d=(x2x1)2+(y2y1)2.

Step 2: Sort and Select the K Nearest Neighbors

Sorting the distances in ascending order:

  1. P1 (Distance: 2.24) Class A

  2. P2 (Distance: 3.00) Class A

  3. P3 (Distance: 3.61) Class B

Step 3: Majority Voting

Among the 3 nearest neighbors, Class A appears twice and Class B appears once.

Result: The test point (4, 5) is classified as Class A.

For regression problems, you apply the exact same steps, except instead of taking majority vote, we simply take the average of K-nearest.


Part 2: Linear Regression

For the following regression examples, we will use a tiny dataset to predict a target Y from a single feature X.

Dataset:

We are trying to fit the line equation: y^=mx+b (where m is the weight/slope, and b is the bias/intercept).

Method A: The Normal Equation (Analytical)

The Normal Equation finds the exact optimal weights in one mathematical operation. The formula is:

W=(XTX)1XTY

Step 1: Create the Augmented Matrix X and Vector Y

We add a column of 1s to X to account for the bias term (b).

X=[111213],Y=[235]

Step 2: Calculate XTX

XT=[111123]XTX=[111123][111213]=[36614]

Step 3: Calculate the Inverse (XTX)1

The determinant is (3×14)(6×6)=4236=6.

(XTX)1=16[14663]=[7/3111/2]

Step 4: Calculate XTY

XTY=[111123][235]=[1023]

Step 5: Multiply to get W

W=[7/3111/2][1023]=[(70/3)2310+(23/2)]=[1/33/2]

Result: The optimal bias b0.33 and optimal weight m=1.5.


Method B: Gradient Descent (Iterative)

Gradient Descent optimization, AI generated

Instead of matrix inversion, Gradient Descent updates weights iteratively.

Below is exactly one iteration (epoch) for the three different types of gradient descent using our dataset. Since initial weights are 0, y^ is initially 0 for all points.

1. Batch Gradient Descent (BGD)

BGD calculates the error across the entire dataset (n=3) before making a single update.

Update:

2. Stochastic Gradient Descent (SGD)

SGD updates the weights after evaluating a single, randomly chosen data point. Let's assume the first point chosen is (x=1,y=2). Here, n=1.

Update:

3. Mini-Batch Gradient Descent (MBGD)

MBGD uses a small subset of the data. Let's use a batch size of 2, picking the first two points: (1,2) and (2,3). Here, n=2.

Update: