The History of AI as a Timeline
From the 2012 deep-learning breakthrough through the generative revolution to well-grounded forecasts for 2040. This interactive timeline maps the most important AI models and tools, with charts on the intelligence explosion.
The intelligence explosion
On demanding benchmarks, AI models jumped from beginner to expert level in three years. GPQA Diamond is a PhD-level test; SWE-bench Verified measures real coding tasks.
Bigger, longer, then more efficient
First, models grew by orders of magnitude in parameters and context length. Today, parameter counts are often no longer disclosed, and the contest shifts to price and performance.
The pace accelerates
Each cell counts this timeline's models and tools per month. Yearly leaps have become monthly releases.
What has changed
Four dimensions compared, from 2022 to 2026.
Who leads the market
Chatbot market share by web traffic
AI on the world map
Where AI models are built, how heavily AI is used, and where the most valuable AI companies sit, on a rotatable globe.
Where the curve points
Past trends can be extrapolated, but nothing about that is certain. Two robust data series and the wide range of serious expert forecasts, labeled honestly for what they are.
Measured through late 2025. The task horizon doubles roughly every seven months long-run, lately every four. METR expects month-long projects "by the end of the decade" (range roughly 2027 to 2031). Logarithmic axis; the projection is not a guarantee.
These estimates measure different things (scenario, market, model, survey), which is why they span ~2027 to 2047. The same researcher survey puts full automation of all jobs at around 2116. Forecasts are not guarantees.
The complete timeline
Curated milestones from the 2012 deep-learning breakthrough to well-grounded forecasts for 2040, filterable by modality, license, developer, and year.
Bio-anchors: transformative AI around 2040
Ajeya Cotra's "biological anchors" place transformative AI at a median of around 2040, revised forward from an original 2050. One of several model-based estimates with wide spread.
Markets expect the first general AI
Forecasting markets and the Metaculus community place the first general AI at a median of around 2033, with a wide range (roughly 2027 to 2043). This estimate has moved forward by over 25 years since 2020.
Model estimate: transformative AI
Epoch's "Direct Approach" extrapolates scaling laws and estimates transformative AI at a median of around 2033, though highly assumption-dependent (plausibly 2033 to 2076).
Training runs reach 2·10²⁹ FLOP
Epoch AI considers training runs of around 2·10²⁹ FLOP feasible by 2030, about 10,000 times more than GPT-4. Power is the first constraint, then chip production.
Agents handle month-long projects
If METR's task horizon holds its pace (doubling every four to seven months), by around 2030 agents will autonomously handle tasks that take humans a month. Range depending on the trend: roughly 2027 to 2031.
Scenario: a "superhuman coder"
The forecasting scenario "AI 2027" places a system that solves any coding task faster and cheaper than the best humans in March 2027. A scenario, not a median: the authors now cite around 2030.
Claude Fable 5 and Mythos 5: the Mythos class
Anthropic launches Fable 5 and Mythos 5, a new class above Opus with a 1-million-token context. Days later, a US export control directive suspends access.
MiniMax M3: 1 million tokens from Shanghai
Shanghai lab MiniMax ships M3, an API model with a 1-million-token context. Another Chinese provider pushes toward the frontier.
Mistral Medium 3.5: tuned for coding
Mistral AI updates its Medium line with a coding-focused API model. The parameter count stays undisclosed.
Kimi K2.7 Code: an open coding model
Moonshot AI releases Kimi K2.7 Code, an open MoE model with 1 trillion parameters (32 billion active) and a 256,000-token context, with thinking always on.
GPT-5.6: Sol, Terra, and Luna
OpenAI previews the GPT-5.6 family. The flagship, Sol, hits 88.8% on Terminal-Bench 2.1 (Ultra mode 91.9%) and launches first as a limited preview for around 20 partners.
Claude Opus 4.8: dynamic workflows
Anthropic releases Opus 4.8 with hundreds of parallel subagents per session, effort control, and a 3x cheaper fast mode. SWE-bench Verified rises to 88.6%.
Gemini 3.5 Flash: a fast all-rounder
At Google I/O 2026, Gemini 3.5 Flash arrives with four times faster output and strong scores on agentic benchmarks.
Claude Opus 4.7: adaptive thinking and task budgets
Anthropic releases Opus 4.7 with a new tokenizer, adaptive thinking, task budgets for agents, and higher-resolution vision. SWE-bench Verified: 87.6%.
GPT-5.5: agentic workflows over hours
GPT-5.5 plans and uses tools autonomously across long-running tasks, reaches 82.7% on Terminal-Bench 2.0, and consumes far fewer tokens doing so.
DeepSeek-V4-Pro: 1.6 trillion parameters, open
DeepSeek releases an open MoE model with 1.6 trillion parameters (49 billion active) under an MIT license. Open models reach trillion scale.
Kimi K2.6: an open trillion-scale model from China
Moonshot AI releases Kimi K2.6 with 1 trillion parameters, native INT4 quantization, and a 262,000-token context under a modified MIT license.
Qwen 3.6 Max: Alibaba's trillion-scale MoE
Alibaba ships Qwen 3.6 Max-Preview, a sparse MoE model with around a trillion parameters, an integrated thinking mode, and a 262,000-token context.
GPT-5.4: OpenAI keeps the pace
OpenAI ships GPT-5.4, hitting 92.8% on GPQA Diamond with gains in knowledge and multilingual tasks.
GPT-5.3-Codex: coding and reasoning unified
OpenAI unifies frontier coding and professional reasoning in a single model for the first time. It runs about 25% faster than its predecessor.
Claude 4.6: 1 million tokens and agent teams
Anthropic releases Opus 4.6 and Sonnet 4.6 with a 1-million-token context. Opus 4.6 hits 76% on the MRCR v2 benchmark and coordinates "agent teams".
Gemini 3.1 Pro: a double reasoning leap
Google releases Gemini 3.1 Pro with more than double the reasoning performance over Gemini 3 Pro and a record 94.3% on GPQA Diamond.
Gemini 3 Pro: Google's next big leap
Google releases Gemini 3 Pro with markedly improved reasoning and multimodal understanding, setting new highs.
GPT-5.2: faster answers on the GPT-5 base
OpenAI follows up with GPT-5.2, improving speed and reasoning over the original GPT-5.
DeepSeek-V3.2: cheap open frontier performance
DeepSeek updates its open MoE model and keeps the gap to proprietary leaders small, at a fraction of the cost.
Mistral Large 3: Europe's flagship model
Paris-based Mistral releases a 675-billion MoE model, keeping the European flag flying in the AI race.
Claude Opus 4.5: new flagship
Anthropic releases Claude Opus 4.5, cementing its position on agentic and coding tasks.
Claude Sonnet 4.5: hours-long agent runs
Anthropic releases Claude Sonnet 4.5, which by its own account can work autonomously on complex tasks for over 30 hours.
Sora 2 and its own social app
OpenAI ships Sora 2 with synchronized audio and a dedicated app for AI videos. Cameos let users place themselves into generated clips.
GPT-5: one model for everything
OpenAI ships GPT-5, automatically blending fast answers and deep reasoning based on the task. Manual model switching disappears.
Nano Banana: image editing by language
Google's image model nicknamed "Nano Banana" edits photos consistently across multiple steps. Targeted editing instead of regenerating becomes the norm.
Grok 4: xAI at the benchmark frontier
xAI releases Grok 4, reporting top scores on several reasoning benchmarks. The Colossus GPU cluster pays off.
Claude Opus 4 and Sonnet 4: agentic coding
Anthropic releases Claude Opus 4 and Sonnet 4, able to work on code autonomously for hours. Claude takes the lead in programming.
Google Veo 3: video with synchronized audio
Veo 3 generates clips with matching sound and dialogue for the first time. Generated video becomes hard to tell from real footage.
Llama 4: Meta's mixture-of-experts generation
Meta introduces Llama 4 Scout and Maverick as open MoE models, with Scout offering a ten-million-token context window.
OpenAI o3 and o4-mini: reasoning with tools
OpenAI releases o3 and o4-mini, which use tools like web search and code on their own while thinking. Agentic reasoning becomes standard.
ChatGPT image generation: the "Ghibli" moment
OpenAI builds native image generation into GPT-4o. Millions turn photos into anime styles, and the servers hit their limits for days.
Gemini 2.5 Pro: Google takes the lead
Google releases Gemini 2.5 Pro, topping many leaderboards clearly for the first time. The company is back in the race for the best model.
Claude 3.7 Sonnet: first hybrid reasoning model
Anthropic combines fast answers and visible thinking in one model. Users control how long Claude "thinks" about a task.
Claude Code: agentic coding in the terminal
Anthropic introduces Claude Code, an agent that handles whole tasks across many files in the terminal. It becomes the template for agentic coding.
Frequently asked questions about the history of AI
The key questions about the development of AI models and tools.
Prehistory & forecasts
- • Added the deep-learning prehistory: milestones from AlexNet (2012) to AlphaGo and WaveNet (2016)
- • New forecast section with trend extrapolation (METR task horizon, Epoch compute) and sources
- • Forecasts for 2027-2040 as marked entries in the timeline, with uncertainty ranges
Initial release
- • Interactive timeline of generative-AI milestones from 2017 to 2026
- • Filter by modality, license, developer, and year, plus timeline and grid views
- • Stats dashboard with charts on the intelligence explosion (GPQA, SWE-bench, parameters, context)
- • Benchmark scores for older models individually sourced