**Joel Becker** (0:02)
Bloomberg Audio Studios, podcasts, radio, news.
**Joe Weisenthal** (0:18)
Hello, and welcome to another episode of the Odd Lots Podcast. I'm Joe Weisenthal.
**Tracy Alloway** (0:23)
And I'm Tracy Alloway.
**Joe Weisenthal** (0:24)
Tracy, one thing about AI is that lots of lines that go up.
**Tracy Alloway** (0:30)
Yes, famously, there is perhaps one line that has captured the attention more than others when it comes to lines going up.
**Joe Weisenthal** (0:37)
Yes, but we're recording this April 7th. Did you see the anthropic revenue chart, by the way? Ooh. It's just like straight.
It's just on the number of lines going up. There are some really-
**Tracy Alloway** (0:49)
All right, let me caveat that. Up until recently, there was one chart of a line going up exponentially that became, I think it's fair to say, the most viral chart in AI, right?
**Joe Weisenthal** (1:00)
Yes, I would absolutely agree with that. One of the many lines that go up, and there are various lines that capture this, is essentially just measures of AI progress of what they could do, what the models are capable of and so forth.
There's all different benchmarks out there, and hobbyist benchmark creators, etc. All kinds of benchmarks out there. Organization called METR, based out in San Francisco, and they measure how well AI models are doing at various engineering tasks, etc.
They have these charts showing certain tasks, how long it would take a human to do them, and then whether AI could do them. Yes, the line is just almost vertical. I think there was one of the ones that came out maybe very early this year or late last year, showing the latest Claude model, and I was just like, this is crazy.
**Tracy Alloway** (1:51)
When I look at these charts, they're called time horizon charts. When I look at them, intuitively, I kind of understand what they're saying and you can kind of see the leap in progress between some of the previous models and Claude, the latest Claude model. That's what got everyone excited was you had this big exponential shift up in the capability of that particular AI model. But then when I start diving into what it actually says on METR's website about what these charts represent, I start getting really confused. I know everyone wants to get excited about AI and charts going up in general, but I think there's a lot of nuance here and we should probably talk about it. Because the other thing going on with METR right now is they've become sort of the industry standard benchmark. A lot of investment decisions are being based on these charts.
If you oversimplify them as just like, okay, lines going up and then suddenly it goes up even more, obviously, people are going to start to get maybe a little overexcited.
**Joe Weisenthal** (2:49)
Can I say one other thing too that I'm very curious about? I'm really glad that there are people designing various benchmarks for measuring AI progress, seems like an important thing to get a handle on.
But like if I were like say, like talented or smart enough to be like doing these things, I would go work for one of the labs and make $10 million a year or something like that. So I'm actually curious because a lot of non-profits, et cetera, it's like, do you really want to be working at the cutting edge of AI in a non-profit? I guess OpenAI is owned by a non-profit weirdly enough. But you know what I'm saying? I would want the money.
**Tracy Alloway** (3:24)
We should talk about it with our guests who are currently sitting right here.
**Joe Weisenthal** (3:28)
That's exactly right. I'm very excited to say we have the two perfect guests to talk about the most viral and maybe important chart in AI right now. We're going to be speaking with Joel Becker. He is a member of the technical staff at METR. And we're also going to be speaking with Chris Painter, the president of METR. So Joel and Chris, thank you so much for coming on Odd Lots.
**Joel Becker** (3:47)
Thank you so much for having us.
**Chris Painter** (3:48)
Thank you for having us.
**Joe Weisenthal** (3:49)
Yeah, really excited to chat with both of you. Chris, since you're the president, I'll start with you. What is METR? How long has it been around? What is this organization? What's its goal? Just give us the sort of 60-second synopsis of METR.
**Chris Painter** (4:02)
Yeah, totally.
Sometimes I give a long version, I can try and do a short version here. So METR is a research nonprofit based in the Bay Area, like you said, dedicated to advancing the science of measuring whether and when AI systems might pose catastrophic risks to humanity as a whole, focused specifically on threats that come from AI autonomy or AI systems themselves. So when you talk about this whole field in AI of dangerous capability evaluations, people seeing, can this AI system assist with a chemical or biological weapon attack? Can it advance bad actors' ability to execute cyber attacks on a really large scale? METR is specialized in specifically assessing how autonomous are AI systems, what is the scale and length and difficulty of tasks that they're able to do by themselves, partially because we think it sets the stakes for conversations about AI misalignment. So we see ourselves as being on the hook for at any given point in time, giving humanity the bits of evidence that are most informative for establishing the stakes of, are we reliant on AI systems as a society in a way that could make it really bad if they are misaligned?
52 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000763538457
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000763538457