THE SCARIEST CHART IN AI JUST GOT SCARIER artwork

THE SCARIEST CHART IN AI JUST GOT SCARIER

AI News Today | Julian Goldie Podcast

February 26, 2026

Julian Goldie breaks down the most terrifying chart in AI: the METR Time Horizon. With AI agents now doubling their capabilities every 89 days, Claude Opus 4.6 is performing tasks that take human experts 14.5 hours to complete.
Speakers: Julian Goldie
**Julian Goldie** (0:00)
The scariest chart in AI just got scarier. Your job has a timer on it, and I don't say that to scare you. I say it because the data is sitting right in front of us, and the people building this technology are no longer hiding what they see. Sam Altman just said out loud on stage this week, the world is not prepared. He runs OpenAI. He knows what's in the lab, and he's scared. So let me start with a chart, just a chart. No flash heap at launch, no new company, no billion-dollar funding round. Just a chart that nonprofit research group called METR quietly updated on February 20th, 2026 And when they updated it, the AI world collectively held its breath because the new data point on the chart, one single dot, is the most terrifying and the most exciting thing I've seen in years covering this space. The dot represents Claude Opus 4.6, and it sits at 14 and a half hours. I'm going to explain what that means in a second. And what I do, I need you to play close attention because most people who share this chart online get it wrong. They misread it, and when you misread it, you actually undersell how scary it is. So let me walk you through exactly what this chart is, what it measures, and why Claude Opus 4.6 is a new number jarred researchers and lab leaders across the industry. What it means for you, your job, your business, and the next few years of your life. Let's strap in. So first of all, what is METR, and why does this chart matter so much? So METR, it stands for Model Evaluation and Threat Research. They are a non-profit. They don't work for OpenAI, they don't work for Anthropic, they don't work for Google. They have no financial interest in making AI look better or worse than it is. Their whole job is to figure out how capable these AI models actually are. And a little over a year ago, in March 2025, METR published something that quietly changed the conversation in AI research circles. They published what they call the Time Horizon Chart. Here is what the Time Horizon Chart actually measures. METR took hundreds of complex tasks, coding tasks, machine learning tasks, cybersecurity tasks, software engineering problems, and they handed those tasks to human experts, professional people who do this work for a living. And they measured how long it took each expert to complete each task. Then they handled and handed over those same tasks to AI agents, and they measured whether AI could complete the task successfully. The y-axis on the chart, the vertical line, shows the length of tasks in terms of how long a human expert would take to complete them. Here is where most people get confused. The chart is not measuring how long the AI takes to do the task. Read that again. The chart is measuring how long it takes a human to do the task that the AI can now complete. This is a subtle but massive difference. The AI might finish in minutes what a human takes eight hours to do, or it might take all day. That's not what the chart is tracking. The chart is tracking the difficulty of the work measured in human effort. The 50% time horizon, which is the main number people talk about means at this task length, the AI succeeds about half the time. When METR says Claude Opus 4.6 has a 50% time horizon of 14 and a half hours, what they're saying is give Opus Claude, sorry, give Opus the task that would take a trained, experienced, professional, human, expert, 14 and a half hours to complete, nearly two full working days, and that AI will finish it successfully about half the time. That is extraordinary. That is not a chat bot answering a question. That is an AI agent completing two days of expert professional work from start to finish autonomously with no human guiding it step by step. Half the time. And the reason everyone in the AI world is losing their minds is not just the number itself, it's the trend line behind it. So let's talk about the trend line that is breaking every prediction. Let me take you back through the progression of this chart, because the speed of what has happened is almost impossible to believe unless you see it laid out. In mid 2024, less than two years ago, frontier AI models like GPT-40 had time horizons measured in single digits. We're talking tasks that human experts could complete in five or ten minutes. So that was the ceiling. By early 2025, Claude 3.7 Sonnet hit roughly 59 minutes. That felt like a big deal at the time. By December 2025, nine months later, Claude Opus 4.5 jumped to 4 hours and 49 minutes. The AI community freaked out. Articles were written, researchers issued statements, and then two months after that, on February 20, 2026, METR added Claude Opus 4.6, 14.5 hours. In two months, the number nearly tripled. That jump from 4 hours and 49 minutes to 14.5 hours is a 2.2x increase over the previous best model, which was GPT 5.2. Claude Opus 4.6 did not just beat the previous record, it nearly doubled it. Now, here is the trendline underneath all of this. METR originally found, when they published the chart in March 2025, that AI agent capabilities had been doubling roughly every seven months for the past six years. Every seven months, the AI could handle tasks twice as complex. That alone would have been shocking. Seven month doublings are fast. When METR updated their methodology in January 2026, and released what they call Time Horizon 1.1, they revised the doubling rate. And the doubling time since 2024 is now 89 days, not seven months, 89 days. That is roughly every three months. So let me say that differently. Every three months, these AI systems are doubling in their ability to handle complex work. Every three months, when the METR member Sydney von Arx was asked about this, she said, and I want to be fair here and give you her full quote, you should absolutely not tie your life to this graph, but I bet this trend is going to hold. That is a researcher who helped build the benchmark telling you, I wouldn't bet everything on these exact numbers, but I believe the direction is real, and I believe it continues. And every single time a new model has dropped, the trend has not just continued, it's continued and recently accelerated. So the benchmark is breaking from the weight of its own results. Here is something METR is very open about, and it's important context. With Claude Opus 4.6 at 14 and a half hours, the confidence interval, the range of where the true number might actually be, goes from six hours all the way up to 98 hours. That is a huge range. And the reason the range is so wide is telling. METR says their task suite is now nearly saturated. That means Claude Opus 4.6 is completing so many of the tasks successfully that METR is running out of hard enough tasks to measure where the ceiling actually is. Think about it. The benchmark designed to measure how capable these AI systems are is struggling to keep up with how capable these systems are. METR is actually building harder, longer tasks to try to get a tighter measurement. They doubled the number of tasks that takes humans eight or more hours from 14 tasks to 31 tasks, specifically to try to find Opus 4.6's real ceiling. What that means in plain English is this. 14 and a half hours might be an underestimate. The real capability could be significantly higher, and the measurement instrument isn't powerful enough to tell us yet. That is not a reason to relax. If anything, it's a reason to pay closer attention. Now, what the leaders of the AI labs are actually saying is this. I want to give you the words directly from the people running the companies that are building these systems, because what they are saying right now, in the last two weeks, is more direct and more alarming than anything I've ever heard from this community before. Let's start with Sam Altman. Sam Altman is the CEO of OpenAI. He was speaking at the India AI Impact Summit in New Delhi on February 23rd, 2026, just days ago. And he said, and I'm paraphrasing closely here, from the inside, looking at what's gonna happen, the world is not prepared. He said, AGI, which stands for Artificial General Intelligence, an AI that can do basically anything a human professional can do, feels pretty close at this point. He said, it's going to be a faster takeoff than I originally thought, and that is stressful and anxiety-inducing. The CEO of OpenAI just told a crowd of thousands of people that he's stressed and anxious about how fast this is moving. He also said OpenAI already has models internally that go beyond what is publicly available. He said they plan to have an intern-level AI research assistant built by September 2026 and a fully autonomous AI researcher by March 2028 Just two years from now, right? And then Altman said something that should stop you in your tracks. He said that by the end of 2028, more of the world's intellectual capacity could be sitting inside data centers than outside them. More intelligence inside computers than walking around in human heads. That is the CEO of the most powerful AI company in the world saying this, out loud, in public, last week. Now, let's go to Dario Amodei. Dario runs Anthropic, the company that makes Claude. He sat down for a three-hour interview with journalist Daqresh Patel, roughly ten days ago. The title of the interview was We Are Near the End of the Exponential. When people heard that title, a lot of them thought, oh, good, the pace is slowing down. The exponential is ending. That is the opposite of what Dario means. He's not saying the curve is flattening. He's saying we are approaching the end game. We are getting close to the point where all the benchmarks that measure AI against human performance will be fully saturated. I will be at or above human level on everything we know how to measure. He's saying the exponential is ending because the AI is running out of human benchmarks to surpass. In that same interview, Dario shared Anthropics Financial in a way that he has never done before. He said Anthropics went from $100 million in revenue in 2023 to $1 billion in 2024 to roughly $10 billion in 2025 And then he said they added another few billion in January of 2026 alone. That's a 10x revenue growth year after year for three years straight. And he credited a huge portion of that to Claude Code, Anthropics Coding Agent, which started as an internal experiment that his own agents and engineers became obsessed with.

21 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000651996090

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000751829694