Softmax

Deep Learning

Softmax is a function that converts a vector of raw scores into a probability distribution where all values sum to 1. It is the standard output activation for multi-class classification and attention mechanisms.

Understanding Softmax

The softmax function converts a vector of raw scores (logits) into a probability distribution where all values sum to one, with larger inputs receiving exponentially higher probabilities. It serves as the standard output activation for multi-class classification tasks in neural networks, appearing in the final layer of models for image recognition, text classification, and more. Softmax is also central to the attention mechanism in transformers, where it normalizes attention weights across sequence positions. The function's temperature parameter controls the sharpness of the distribution: lower temperatures produce more confident, peaked outputs while higher temperatures yield more uniform, exploratory distributions. During training, softmax outputs are typically paired with cross-entropy loss to drive effective gradient-based optimization through backpropagation.

Sparse Model

Back to glossary

Softmax

Understanding Softmax

Related in Deep Learning

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size