Graphs That Explain the State of AI in 2026

The 2026 AI Index report from Stanford University reveals a landscape of accelerating compute capacity, record-breaking investment, and a deepening US-China rivalry, while highlighting new challenges in environmental impact and model reliability.

The capabilities of leading AI models continue to accelerate, and the largest AI companies, including OpenAI and Anthropic, are hurtling toward IPOs later this year. Yet resentment toward AI continues to simmer, and in some cases has boiled over, especially in the United States, where local governments are beginning to embrace restrictions or outright bans on new data center development.

It’s a lot to keep track of, but the 2026 edition of the AI Index from Stanford University’s Human-Centered Artificial Intelligence center pulls it off. The report, which comes in at over 400 pages, includes dozens of data points and graphs that approach the topic from multiple angles, from benchmark scores to investment and public perception.

As in prior years, we’ve read the report and identified the trends that encapsulate the state of AI in 2026.

US companies lead in AI models

The United States has led the charge in AI model releases over the past decade, and that remains as true in 2025 as in any year prior. According to research institute Epoch AI, organizations based in the United States released 50 “notable” models in 2025. However, China’s output is beginning to close the gap.

Nearly all the notable models originated within industry (as opposed to academic or government institutions). Epoch AI tracked 87 notable model releases from industry in 2025, compared to just seven from all other sources. This is a major long-term trend. Models released by industry now make up over 90 percent of notable models, up from just under 50 percent in 2015, and zero in 2003.

While U.S. companies released the largest number of notable AI models, China has an equally clear lead in the deployment of robotics. According to data from the International Federation of Robotics, China installed 295,000 industrial robots in 2024. Japan installed roughly 44,500, and the United States installed 34,200.

World AI compute capacity has grown 3.3x yearly since 2022

The latest Stanford AI Index report has no shortage of head-turning numbers on the AI build-out, but none beats EpochAI’s gauge of total AI compute.

This graph, which uses the compute power of Nvidia’s H100e as a yardstick, shows that the world’s AI compute capacity has increased more than threefold every year since 2022. Total AI compute has increased 30-fold since 2021, the first year tracked.

Nvidia has benefited most from this build-out, as its GPUs account for over 60 percent of the total AI compute capacity in the world today. Amazon and Google—each of which design their own hardware for AI workloads—come in second and third.

Training AI models can generate enormous carbon emissions

Stanford’s AI Index has called out the carbon emissions from AI training in prior years, and the issue continues to trend in a worrying direction.

The report estimates that training the latest frontier large language models, such as xAI’s Grok 4, can generate over 72,000 tons of carbon-equivalent emissions. That’s a huge increase from estimates in prior years. OpenAI’s GPT-4 was estimated at 5,184 tons, and Meta’s Llama 3.1 405B was estimated at 8,930 tons.

Ray Perrault, co-director of the AI Index steering committee, says these figures are estimates. “These estimates should be interpreted with caution. In the case of Grok, they rely heavily on inferred inputs drawn from public reporting, xAI statements, and other non-verifiable sources,” says Perrault. On the other hand, Perrault noted that “Epoch AI independently estimates Grok 4’s emissions to be significantly higher at approximately 140,000 tons of CO₂.”

Emissions from AI inference also continue to increase, though results again vary by model. The report estimates that carbon emissions from models with the least efficient inference are over 10 times as high as those with the most efficient inference. DeepSeek’s V3 models were estimated to consume around 23 watts when responding to a “medium-length” prompt, while Claude 4 Opus was estimated to consume about 5 watts.

LLMs are rapidly defeating new benchmarks

The capabilities of AI models have improved with incredible speed over the past decade, and progress seems to be accelerating. Multimodal LLMs, in particular, are conquering benchmarks nearly as quickly as they can be invented.

Agentic AI has experienced the most extreme gains. The two steep lines at the right of the chart represent the OSWorld benchmark, which benchmarks autonomous computer use, and the SWE-Bench Verified software engineering benchmark, which benchmarks autonomous coding.

Models are also rapidly improving on Humanity’s Last Exam. This benchmark includes questions contributed by subject-matter experts designed to represent the toughest problems in their fields. The 2025 Stanford AI Index reported the top-ranking model, OpenAI’s o1, correctly answered just 8.8 percent of questions. Since then, accuracy has increased to 38.3 percent—and even that number is a bit out of date, as the best-scoring models as of April 2026 (such as Anthropic’s Claude Opus 4.6 and Google’s Gemini 3.1 Pro) top 50 percent.

AI research in medicine sees gains

Gains in AI benchmarks seem to be reflected in medicine, where AI adoption has increased at a rapid pace. Medical research shows particularly quick adoption. The number of publications on the topic of AI use for drug discovery has more than doubled over the past two years. There are 2.7 times as many publications on multimodal biomedical AI, which are used to examine medical images alongside text, as there were two years ago.

LLMs still have trouble reading an analog clock

While AI models have improved rapidly in some areas, they remain remarkably bad at some common tasks, like reading clocks and understanding calendars. ClockBench, which measures a multimodal LLM’s ability to read an analog clock, found that even the model best at this task, OpenAI’s GPT-5.4, had just 50-50 odds of getting it right.

Most models scored far worse. Anthropic’s Claude Opus 4.6 read the time correctly with just 8.9 percent accuracy. That’s surprising, because the model often scores well in other benchmarks. Perrault says it represents a more general issue where the language component carries a surprisingly large part of the burden, even to the extent of ignoring non-language information completely.

AI investment hit a new peak in 2025

The gains in AI model performance have gone hand-in-hand with investment in AI companies. According to data from AI analytics company Quid, 2025 set a new record for AI investment with over US $581 billion spent.

That’s more than double the $253 billion spent in 2024 and speeds past the previous record of $360 billion, which was set in 2021. And unlike 2021, where investment was led by mergers and acquisitions, 2025’s record-setting result was led by private investment in AI companies. Most of that money is flowing into the United States, where over $344 billion were invested in AI last year.

Source: Hacker News