Be the first user to complete this post

Add to List 
12. Softmax Activation: Understanding Probability Distributions in Neural Networks
What is Softmax?
Softmax is a function that converts a vector of raw scores (logits) into a vector of probabilities. The output of the softmax function is a probability distribution—each element in the output vector is between 0 and 1, and the sum of all elements is 1.
Mathematically
For a vector z
of logits, the softmax function is defined as:
softmax(z_{i}) = e^{zi} / Σ_{j} e^{zj}
Where:
z_{i}
is the raw score (logit) for classi
. The numerator
e^{zi}
is the exponential of the raw score.  The denominator is the sum of the exponentials of all the logits in the vector, ensuring that the sum of the output probabilities equals 1.
Applying Softmax in PyTorch
In PyTorch, F.softmax(output, dim=1)
applies the softmax function to the output tensor along the specified dimension dim=1
.
1. Output Tensor (Logits)
The output tensor is typically a 2D tensor of shape (batch_size, num_classes)
, where:
batch_size
is the number of samples in the batch.num_classes
is the number of classes your model is predicting.
For example, if output has the shape (1, 3)
, it might look like this:
[[2.0, 1.0, 0.1]]
Here, the output represents the logits for three classes.
2. Softmax Transformation
F.softmax(output, dim=1)
converts these logits into probabilities along the num_classes
dimension (dim=1). After applying softmax, the output might look like this:
[[0.659, 0.242, 0.099]]
This means that the model is 65.9% confident in class 0, 24.2% confident in class 1, and 9.9% confident in class 2.
3. Dimension Argument (dim=1)
dim=1
specifies that the softmax function should be applied across the num_classes
dimension. This ensures that for each sample in the batch, the logits are converted into probabilities that sum to 1 across all classes.
Why Use Softmax?
 Probabilities: Softmax transforms raw scores into probabilities, making the output interpretable in terms of likelihood for each class.
 MultiClass Classification: Softmax is typically used in the last layer of a neural network for multiclass classification tasks, where the model needs to assign a probability to each class.
 Loss Calculation: The output of softmax is often used with the negative loglikelihood loss or crossentropy loss, which compares the predicted probabilities with the true labels.
Example Calculation
For a clearer picture, let’s calculate the softmax manually:
Assume the logits are [2.0, 1.0, 0.1]
:
e^{2.0} ≈ 7.389
e^{1.0} ≈ 2.718
e^{0.1} ≈ 1.105
7.389 + 2.718 + 1.105 ≈ 11.212
softmax(2.0) ≈ 7.389 / 11.212 ≈ 0.659
softmax(1.0) ≈ 2.718 / 11.212 ≈ 0.242
softmax(0.1) ≈ 1.105 / 11.212 ≈ 0.099
 Calculate the exponentials:
 Sum of exponentials:
 Compute softmax for each class:
So the resulting probabilities are approximately [0.659, 0.242, 0.099]
.
Summary
 Softmax converts logits to a probability distribution.
dim=1
indicates that softmax is applied across the class scores for each sample. The output probabilities sum to 1 and represent the model’s confidence in each class for a given input.