Medium

# What is Softmax?

Softmax is a function that converts a vector of raw scores (logits) into a vector of probabilities. The output of the softmax function is a probability distribution—each element in the output vector is between 0 and 1, and the sum of all elements is 1.

## Mathematically

For a vector `z` of logits, the softmax function is defined as:

``softmax(zi) = ezi / Σj ezj``

Where:

• `zi` is the raw score (logit) for class `i`.
• The numerator `ezi` is the exponential of the raw score.
• The denominator is the sum of the exponentials of all the logits in the vector, ensuring that the sum of the output probabilities equals 1.

## Applying Softmax in PyTorch

In PyTorch, `F.softmax(output, dim=1)` applies the softmax function to the output tensor along the specified dimension `dim=1`.

### 1. Output Tensor (Logits)

The output tensor is typically a 2D tensor of shape `(batch_size, num_classes)`, where:

• `batch_size` is the number of samples in the batch.
• `num_classes` is the number of classes your model is predicting.

For example, if output has the shape `(1, 3)`, it might look like this:

``[[2.0, 1.0, 0.1]]``

Here, the output represents the logits for three classes.

### 2. Softmax Transformation

`F.softmax(output, dim=1)` converts these logits into probabilities along the `num_classes` dimension (dim=1). After applying softmax, the output might look like this:

``[[0.659, 0.242, 0.099]]``

This means that the model is 65.9% confident in class 0, 24.2% confident in class 1, and 9.9% confident in class 2.

### 3. Dimension Argument (dim=1)

`dim=1` specifies that the softmax function should be applied across the `num_classes` dimension. This ensures that for each sample in the batch, the logits are converted into probabilities that sum to 1 across all classes.

## Why Use Softmax?

• Probabilities: Softmax transforms raw scores into probabilities, making the output interpretable in terms of likelihood for each class.
• Multi-Class Classification: Softmax is typically used in the last layer of a neural network for multi-class classification tasks, where the model needs to assign a probability to each class.
• Loss Calculation: The output of softmax is often used with the negative log-likelihood loss or cross-entropy loss, which compares the predicted probabilities with the true labels.

## Example Calculation

For a clearer picture, let’s calculate the softmax manually:

Assume the logits are `[2.0, 1.0, 0.1]`:

``````e2.0 ≈ 7.389
e1.0 ≈ 2.718
e0.1 ≈ 1.105``````
``7.389 + 2.718 + 1.105 ≈ 11.212``
``````softmax(2.0) ≈ 7.389 / 11.212 ≈ 0.659
softmax(1.0) ≈ 2.718 / 11.212 ≈ 0.242
softmax(0.1) ≈ 1.105 / 11.212 ≈ 0.099``````
• Calculate the exponentials:
• Sum of exponentials:
• Compute softmax for each class:

So the resulting probabilities are approximately `[0.659, 0.242, 0.099]`.

## Summary

• Softmax converts logits to a probability distribution.
• `dim=1` indicates that softmax is applied across the class scores for each sample.
• The output probabilities sum to 1 and represent the model’s confidence in each class for a given input.