What is Stochastic Gradient Descent?

Deep Learning

Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates model weights using the gradient computed from a random subset (mini-batch) of training data. SGD is computationally efficient and adds beneficial noise that helps escape local minima.

Understanding Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates model parameters using the gradient computed from a single randomly selected training example or a small mini-batch, rather than the entire dataset. This stochastic approach introduces noise that can help the model escape local minima and often leads to better generalization compared to full-batch gradient descent. SGD is the foundational optimizer behind virtually all deep learning training, though modern variants like the Adam optimizer, RMSProp, and AdaGrad add adaptive learning rates for faster convergence. Key hyperparameters include the learning rate, momentum, and batch size, each significantly affecting training dynamics. Despite its simplicity, SGD with proper tuning remains competitive with more sophisticated optimizers across many training scenarios.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Style Transfer

Back to full glossary

Stochastic Gradient Descent

Understanding Stochastic Gradient Descent

Is AI recommending your brand?

Related Deep Learning Terms

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size