What is a Context Window?
The context window refers to the maximum amount of text that a Large Language Model can process at once. It includes both your input (prompt) and the model's output.
Think of the context window as the model's working memory: everything that fits inside can be "seen" and considered by the model. What's outside doesn't exist for the model.
Context Windows of Current Models
Here's an interactive overview of context windows for over 140 current LLMs from Anthropic, Google, OpenAI, Meta, and other leading providers:
Model | Developer | Context Window |
|---|---|---|
| Meta | 10M | |
| Alibaba | 10M | |
2M | ||
2M | ||
| xAI | 2M | |
| xAI | 2M | |
| Meta | 1M | |
1M | ||
1M | ||
1M | ||
1M | ||
1M | ||
| Anthropic | 1M | |
| Anthropic | 1M | |
| OpenAI | 1M | |
| OpenAI | 1M | |
| OpenAI | 1M | |
| Alibaba | 1M | |
| Alibaba | 1M | |
| Amazon | 1M | |
| Amazon | 1M | |
| Amazon | 1M | |
| MiniMax | 1M | |
| OpenAI | 400K | |
| OpenAI | 400K | |
| OpenAI | 400K | |
| OpenAI | 400K | |
| OpenAI | 400K | |
| OpenAI | 400K | |
| Amazon | 300K | |
| Amazon | 300K | |
| Alibaba | 262.14K | |
| xAI | 256K | |
| xAI | 256K | |
| Mistral | 256K | |
| Mistral | 256K | |
| Alibaba | 256K | |
| Cohere | 256K | |
| Cohere | 256K | |
| AI21 Labs | 256K | |
| AI21 Labs | 256K | |
| AI21 Labs | 256K | |
| MiniMax | 245.76K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| Anthropic | 200K | |
| OpenAI | 200K | |
| OpenAI | 200K | |
| OpenAI | 200K | |
| OpenAI | 200K | |
| 01.AI | 200K | |
| 01.AI | 200K | |
| xAI | 131.07K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
| Meta | 128K | |
128K | ||
128K | ||
128K | ||
| xAI | 128K | |
| OpenAI | 128K | |
| OpenAI | 128K | |
| OpenAI | 128K | |
| OpenAI | 128K | |
| OpenAI | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| DeepSeek | 128K | |
| Mistral | 128K | |
| Mistral | 128K | |
| Mistral | 128K | |
| Mistral | 128K | |
| Mistral | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Alibaba | 128K | |
| Cohere | 128K | |
| Cohere | 128K | |
| Amazon | 128K | |
| Microsoft | 128K | |
| Microsoft | 128K | |
| Microsoft | 128K | |
| Microsoft | 128K | |
| Microsoft | 128K | |
| Microsoft | 128K | |
| 01.AI | 128K | |
| 01.AI | 128K | |
| Nvidia | 128K | |
| Nvidia | 128K | |
| Nvidia | 128K | |
| Reka | 128K | |
| Reka | 128K | |
| Reka | 128K | |
| Zhipu AI | 128K | |
| Zhipu AI | 128K | |
| Baidu | 128K | |
| Mistral | 65.54K | |
| Microsoft | 64K | |
| Mistral | 32.77K | |
| Mistral | 32.77K | |
| Alibaba | 32.77K | |
| Alibaba | 32.77K | |
| Alibaba | 32.77K | |
| Microsoft | 32.77K | |
| Databricks | 32.77K | |
32K | ||
| 01.AI | 32K | |
| Microsoft | 16.38K | |
| 01.AI | 16K | |
8.19K | ||
8.19K | ||
| OpenAI | 8.19K | |
| AI21 Labs | 8.19K | |
| Zhipu AI | 8.19K | |
| Baidu | 8K | |
| Cohere | 4.1K | |
| Nvidia | 4.1K | |
| Stability AI | 4.1K | |
| Stability AI | 4.1K |
Context window sizes of current AI language models (as of January 2026)
The table clearly shows the rapid progress: While early models like GPT-3.5 could only process 4,000 to 16,000 tokens, current models like Llama 4 Scout already reach 10 million tokens β equivalent to about 30 Harry Potter books or 25,000 book pages.
What Are Tokens?
Tokens are the basic units into which text is broken down for LLMs. A token isn't always a whole word β common words are often one token, rare words are split into multiple tokens.
Rule of thumb for English: 1 token β 0.75 words. A typical blog post with 1,000 words requires about 1,300 tokens.
Why is the Context Window Important?
For Conversations
The model "forgets" earlier parts of a long conversation when they no longer fit in the context window. That's why chatbots can lose track in very long conversations.
For Document Analysis
A larger context window enables analysis of longer documents. With Gemini 1.5 Pro, you can analyze entire books at once β with older models, you have to split texts.
For Code Assistants
AI code assistants like Claude Code benefit from large context windows as they can "see" and understand more files simultaneously.
Strategies for Limited Context
- Summarizing: Summarize long texts before the prompt
- Chunking: Split documents into sections and process individually
- RAG: Retrieve relevant passages via vector search instead of inserting everything
- Conversation Reset: Repeat important info in long chats
Lost in the Middle
Studies show that LLMs process information at the beginning and end of the context window better than in the middle. This phenomenon is called "Lost in the Middle." Important information should therefore be placed at the beginning or end of your prompt.
Cost Aspect
When using APIs, you pay per token β for both input and output. Using a long context window is therefore more expensive. With Claude 3.5 Sonnet, processing 100,000 tokens costs about $0.30.
Conclusion
The context window is one of the most important limitations of modern LLMs. With models like Gemini 1.5 Pro that can process millions of tokens, many previous workarounds become unnecessary. Still, it remains important to design prompts efficiently β both for cost reasons and because of the "Lost in the Middle" effect.
