1. Data Pre-processing

Before you even touch a CNN, your data needs to be prepped. Raw data is messy, and neural networks perform best when the input data is uniform.

The lecture highlights four key preprocessing techniques to improve performance and prevent overfitting:

2. CNN Architecture and Feature Maps

The slides walk through a highly specific architectural example to show exactly how images are transformed from raw pixels into flattened feature vectors.

Let's trace the math of how an image moves through this network:

Pasted image 20260331222837.png

Pasted image 20260331222857.png

3. Weight Updating & The Softmax Function

At the very end of the network, those final 10 neurons use the Softmax Activation function to convert raw mathematical scores into clean, readable probabilities that sum to 1.

The Softmax formula is:

σ(z)i=ezij=1Kezj

Here, ezi is the exponential of the target class score, and it is divided by the sum of the exponentials of all K class scores to generate the final probability (e.g., P(class B)=0.95).

Once the prediction is made, the network calculates the Cross-Entropy loss by comparing the prediction against the True One-Hot Labels, and uses backpropagation to update the weights across all the layers.

4. Tracking Spatial Dimensions (The Math)

The lecture includes a slide breaking down the famous AlexNet architecture, which reveals the exact formula for how spatial dimensions shrink during convolutions and pooling.

If you have an input size (W), a filter size (F), padding (P), and a stride (S), the output dimension is calculated as:

Output Size=WF+2PS+1

Example from the slides:

4.2. Num of Parameters

Pasted image 20260331223319.png

To compute the number of parameters in a Convolutional (Conv2D) layer, you need to know the dimensions of the filter (kernel) being used, the depth (number of channels) of the input, and the number of filters applied in the current layer.

The general formula is:

Parameters=(Fw×Fh×Din+1)×K

Where:

We'll assume 3x3 filter size

Here is the step-by-step mathematical breakdown for the first four Conv2D layers based on lecture example:

1. block1_conv1

2. block1_conv2

(Note: The block1_pool layer has 0 parameters because pooling layers only perform a fixed mathematical operation like taking the maximum value; they do not have learnable weights).

3. block2_conv1

4. block2_conv2

5. Transfer Learning (TBC) Not in Mid

Finally, the lecture formally touches on Transfer Learning. Rather than building and training massive architectures like the one above from scratch—which requires enormous amounts of data and compute—you take an existing Model (Model 01) that was already trained on a massive dataset (Data 01).

You then transfer that pre-learned knowledge (the weights and feature-extracting capabilities) to a new, structurally similar Model (Model 02) to make predictions on a new, smaller dataset (Data 02).