Softmax |

Softmax

Softmax is an activation function used in multi-class classification to convert a vector of real-valued scores (logits) into a probability distribution over classes.

Definition

Given logits $z = (z_1, z_2, \dots, z_K)$,

\[\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}}, \quad i = 1,\dots,K\]

Logits are the raw, unnormalized outputs of a model before an activation function (such as sigmoid or softmax) is applied.

Key properties

Outputs values in (0, 1)
Probabilities sum to 1
Preserves ordering: larger $z_i$ ⇒ larger probability
Smooth and differentiable (good for backpropagation)

Intuition

Each $z_i$ is a score for class $i$
Exponentiation emphasizes larger scores
Normalization forces competition between classes
Produces a probability-like output

Example: $z = (2, 1, 0) \;\Rightarrow\; \text{softmax}(z) \approx (0.67,\;0.24,\;0.09)$

Why softmax is used

Enables multi-class classification
Works naturally with cross-entropy loss
Output can be interpreted as class probabilities

Decision rule

$\hat{y} = \arg\max_i \text{softmax}(z_i)$ (Note: this is equivalent to $\arg\max_i z_i$.)

Softmax vs Sigmoid

Sigmoid	Softmax
Binary classification	Multi-class classification
Single output	Multiple outputs
Independent probabilities	Competing probabilities
$\sigma(z)\in(0,1)$	$\sum_i p_i = 1$

Softmax

Softmax

Definition

Logits are the raw, unnormalized outputs of a model before an activation function (such as sigmoid or softmax) is applied.

Key properties

Intuition

Why softmax is used

Decision rule

Softmax vs Sigmoid