Natural Language Processing

N-gram

An N-gram is a contiguous sequence of N items from a text, used in language modeling and text analysis. Unigrams, bigrams, and trigrams capture local word patterns and co-occurrence statistics.

Understanding N-gram

An N-gram is a contiguous sequence of N items from a given text, where items are typically words or characters. Unigrams (N=1) are single words, bigrams (N=2) are pairs of consecutive words, and trigrams (N=3) are sequences of three. Before the deep learning era, N-gram language models were the dominant approach in natural language processing, estimating the probability of the next word based on the preceding N-1 words. N-grams remain useful in modern applications for feature extraction in text classification, spell checking, keyboard prediction on mobile devices, and as baseline comparisons for neural language models. They also serve as the foundation for evaluation metrics like BLEU scores used in machine translation. While transformers have largely surpassed N-gram models in capability, understanding N-grams provides essential intuition about how language models capture sequential patterns.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Related Natural Language Processing Terms

Abstractive Summarization

Abstractive summarization generates new text that captures the key points of a longer document, rather than simply extracting existing sentences. It requires deep language understanding and generation capabilities.

Beam Search

Beam search is a decoding algorithm that explores multiple candidate sequences simultaneously, keeping only the top-k most promising at each step. It balances between greedy decoding and exhaustive search in text generation.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that reads text in both directions simultaneously. BERT revolutionized NLP by enabling deep bidirectional pre-training for language understanding tasks.

Named Entity Recognition

Back to full glossary

N-gram

Understanding N-gram

Is AI recommending your brand?

Related Natural Language Processing Terms

Abstractive Summarization

Beam Search

BERT

Bigram

Byte Pair Encoding

Corpus

Extractive Summarization

Grounding