Natural Language Processing

Perplexity

Perplexity is a metric that measures how well a language model predicts a text sequence — lower perplexity indicates better prediction. It is also the name of an AI-powered search engine that provides cited, conversational answers.

Understanding Perplexity

Perplexity is an evaluation metric for language models that measures how well a model predicts a sequence of words, with lower values indicating better predictive performance. Mathematically, perplexity is the exponentiation of the average negative log-likelihood per token, essentially quantifying how "surprised" the model is by the test data. A model with a perplexity of 20 on a given text is as uncertain as if it were choosing uniformly among 20 possible next tokens at each step. Researchers use perplexity to compare language models during pre-training and to track improvements across training checkpoints. While perplexity correlates with model quality, it does not directly capture factors like coherence, factual accuracy, or helpfulness that matter in real-world natural language generation applications. This is why modern evaluation increasingly supplements perplexity with human preference ratings and task-specific benchmarks that better reflect how large language models perform in practice.

Category

Natural Language Processing

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Related Natural Language Processing Terms

Abstractive Summarization

Abstractive summarization generates new text that captures the key points of a longer document, rather than simply extracting existing sentences. It requires deep language understanding and generation capabilities.

Beam Search

Beam search is a decoding algorithm that explores multiple candidate sequences simultaneously, keeping only the top-k most promising at each step. It balances between greedy decoding and exhaustive search in text generation.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that reads text in both directions simultaneously. BERT revolutionized NLP by enabling deep bidirectional pre-training for language understanding tasks.

Bigram

A bigram is a contiguous sequence of two items (typically words or characters) from a given text. Bigram models estimate the probability of a word based on the immediately preceding word.

Byte Pair Encoding

Byte Pair Encoding (BPE) is a subword tokenization algorithm that iteratively merges the most frequent pairs of characters or character sequences. BPE is widely used in modern language models to handle rare words and multilingual text.

Corpus

A corpus is a large, structured collection of text documents used for training and evaluating natural language processing models. The quality and diversity of a training corpus significantly impacts model performance.

Extractive Summarization

Extractive summarization selects and combines the most important sentences directly from a source document to create a summary. It preserves the original wording but may lack the coherence of abstractive approaches.

Grounding

Grounding in AI refers to connecting a model's language understanding to real-world knowledge, data, or sensory experience. Grounded AI systems produce more factual and contextually relevant outputs.