What is Masked Language Model?

Natural Language Processing

Masked Language Model

A masked language model is a training approach where random tokens in a sentence are hidden and the model learns to predict them from context. BERT popularized masked language modeling as a pre-training objective.

Understanding Masked Language Model

A masked language model is a pre-training approach where a model learns to predict randomly hidden (masked) tokens within a text sequence, developing deep bidirectional understanding of language. BERT pioneered this technique by randomly masking 15% of input tokens and training the model to reconstruct them using surrounding context from both directions. This bidirectional approach gives masked language models strong performance on natural language understanding tasks like sentiment analysis, named entity recognition, and question answering. Unlike autoregressive language models that read text left to right, masked language models process the entire input simultaneously, capturing richer contextual relationships. The pre-trained representations transfer effectively to downstream tasks through fine-tuning, making masked language models a cornerstone of modern natural language processing. Variants like RoBERTa and DeBERTa refined the masking strategy and training procedure to achieve even stronger results.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Related Natural Language Processing Terms

Abstractive Summarization

Abstractive summarization generates new text that captures the key points of a longer document, rather than simply extracting existing sentences. It requires deep language understanding and generation capabilities.

Beam Search

Beam search is a decoding algorithm that explores multiple candidate sequences simultaneously, keeping only the top-k most promising at each step. It balances between greedy decoding and exhaustive search in text generation.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that reads text in both directions simultaneously. BERT revolutionized NLP by enabling deep bidirectional pre-training for language understanding tasks.

Meta-Learning

Back to full glossary

Masked Language Model

Understanding Masked Language Model

Is AI recommending your brand?

Related Natural Language Processing Terms

Abstractive Summarization

Beam Search

BERT

Bigram

Byte Pair Encoding

Corpus

Extractive Summarization

Grounding