Q1

Pasted image 20260407190636.png

To solve questions 2 and 3, we need to use the input values provided in question 4: X=[1,5,1], where X1=1, X2=5, and X3=1. We will also apply the given bias b=0.5 to each node.

First, let's calculate the pre-activation values (the weighted sum plus bias) for the hidden nodes H1 and H2. We'll call these ZH1 and ZH2.

ZH1=(X1WX1H1)+(X2WX2H1)+(X3WX3H1)+bZH1=(10.1)+(50.2)+(10.4)+0.5ZH1=0.1+1.0+0.4+0.5=2.0ZH2=(X1WX1H2)+(X2WX2H2)+(X3WX3H2)+bZH2=(10.4)+(50.4)+(10.3)+0.5ZH2=0.4+2.0+0.3+0.5=3.2

1. What is the number of classes in the above graph?

There are 2 classes.

The output layer consists of two nodes (O1 and O2), and the arrows pointing outward denote the respective class labels: 0 and 1.

2. Use Activation Function ReLU to Find h1 and h2

The ReLU (Rectified Linear Unit) activation function outputs the input directly if it is positive, otherwise, it outputs zero: f(z)=max(0,z).

3. Use Sigmoid Activation Function to Find h1 and h2

The Sigmoid activation function is defined as: σ(z)=11+ez.

To find the recommended class, we need to compute the final values for the output nodes O1 and O2. We will use the standard practice of carrying forward the ReLU activations (h1=2.0, h2=3.2) from step 2, as ReLU is standard for hidden layers, and apply the given bias b=0.5.

Let's calculate the pre-activation outputs for O1 (Class 0) and O2 (Class 1):

O1=(h1WH1O1)+(h2WH2O1)+bO1=(2.00.6)+(3.20.2)+0.5O1=1.2+0.64+0.5=2.34O2=(h1WH1O2)+(h2WH2O2)+bO2=(2.00.3)+(3.20.2)+0.5O2=0.6+0.64+0.5=1.74

Conclusion:

The recommended class is Class 0.

Why: Because the calculated value for the O1 node (2.34) is greater than the value for the O2 node (1.74). In classification networks, the node with the highest activation determines the predicted class. (Note: Using the Sigmoid values from step 3 would also result in O1 having a higher value than O2, leading to the exact same classification).


Q2

Pasted image 20260407192136.png

Completed Table

Layer Feature Map Dimension Number of Parameters (Weights) Number of Biases
INPUT 256×256×3 0 0
CONV-9-32 248×248×32 7,776 32
POOL-2 124×124×32 0 0
CONV-5-64 120×120×64 51,200 64
POOL-2 60×60×64 0 0
CONV-5-64 56×56×64 102,400 64
POOL-2 28×28×64 0 0
FC-3 3 150,528 3

Formulas Used

The problem explicitly asks to separate the "Number of Parameters (Weights)" from the "Number of Biases".

1. Feature Map Dimension:

2. Number of Parameters (Weights):

3. Number of Biases:


Step-by-Step Breakdown

1. CONV-9-32

2. POOL-2

3. CONV-5-64

4. POOL-2

5. CONV-5-64

6. POOL-2

7. FC-3


Q3

Deep_Learning_Eq.jpeg

Proof

Pasted image 20260407193825.png


Q4: MCQ

Question 1

In K-Nearest Neighbors (KNN), what is the effect of choosing a very small value for 'K' (e.g., K=1)?

Question 2

What is a potential consequence of setting the learning rate (η) too high in gradient descent?

Question 3

Suppose you are implementing a KNN model with 10 features, but you suspect that some of the features are more important than others for prediction. What can you do to account for this in the distance calculation?

Question 4

What is the main advantage of Mini-Batch Gradient Descent compared to both Stochastic Gradient Descent (SGD) and Batch Gradient Descent (BGD)?

Question 5

Which one is NOT from the advantages of KNN?