**Swyx** (0:05)
Welcome to Lukas and Axel from Andon Labs, and I'm joined by my favorite guest co-host, anything security, safety, alignment, Vibhu. Welcome.
**Lukas Petersson** (0:15)
Thank you for having us.
**Axel Backlund** (0:16)
Thank you.
**Swyx** (0:17)
Let's match names to voices. Maybe you want to take turns introducing yourselves.
**Lukas Petersson** (0:21)
Yeah, I'm Lukas.
**Axel Backlund** (0:22)
And I'm Axel.
**Swyx** (0:24)
Let's introduce Andon Labs a bit. How did you guys come together? You have different backgrounds, but you're both Swedish.
Was that a big part of it?
**Lukas Petersson** (0:33)
Yeah, so when I went to high school, there was this really cool guy who had a superpower. He could code, so he made the web app for the school and stuff. And he was super cool. And I wanted to be like him. And that was that guy.
**Axel Backlund** (0:47)
I don't know about this.
**Swyx** (0:48)
So you went to different universities, right?
**Lukas Petersson** (0:50)
Yeah, but same high school. So we always said, once we graduate university, then we should start a company. And that's what we did.
**Swyx** (0:58)
There you go.
And about a year ago, you burst onto the Seymour Vending Bench. But was there a thing before that that was the inception?
**Axel Backlund** (1:07)
Yeah, so we did work with Anthropic, was one of our early customers in doing evals. So we did dangerous capability evals, nothing we published openly. But then we started thinking about doing some kind of public benchmark. And one thing that we really started thinking about was long-running agents, and specifically agents managing businesses.
And this was early 2025 And I think the first mentions of people will be running one-person unicorns or even autonomous companies. So we thought, let's make a benchmark of how well can an agent run the probably simplest business possible. And that's probably running a vending machine. So that's the first public one we did. And it was very like, there was almost no one that noticed it in the first couple of months, I think. So we released it in February last year. And then I think around Easter last year, we got like the first semi-viral tweet about it that someone else did.
**Lukas Petersson** (2:11)
Yeah, I mean, we tweeted a bunch when it came out. I like tried our best. We tried.
**Vibhu** (2:16)
It's the one at Anthropic, right?
**Lukas Petersson** (2:17)
No, no, no.
**Swyx** (2:19)
So this is a classic thing we should get out of the way.
**Lukas Petersson** (2:20)
Exactly. There's two versions. There's Vending Bench, which is the simulated one, which we did completely independently in February.
And then, like Axel said, that was the thing that didn't get any traction in the beginning. But then some random person made a tweet about it. And you have the paper. That is the paper. Correct. Yeah. And then since we thought this was very fun, we thought like, oh, I think this is also one thing with Andon Labs, the way we decide what to do next and what projects to do. It's like, what is the heuristic we use? It's like, what is fun?
What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So then we basically had this idea, and then we needed a place for it. And putting it out in the public would probably not really work, would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic. And they were like, yeah, you can have space. This sounds fun.
**Swyx** (3:21)
I mean, it's like a small fridge, right? It's like a mini fridge. You know, people, there's like a stripe thing.
**Vibhu** (3:26)
Or like an iPad. This was very OG. That's the OG one.
**Lukas Petersson** (3:29)
Yeah.
**Vibhu** (3:30)
We saw it in June, like two months after, after it had been there. They upgraded it a little bit. There's a security camera for making sure you actually Venmo the thing.
**Swyx** (3:40)
Yeah. So like my impression, I mean, okay, we're going straight into Project Ven because it's such a iconic thing.
I do want to cover a little bit of the origin story, even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in the future of AI, interested in developing evals. But how the hell do you just like walk into Anthropics doors and work with them? Like what are they looking for? What works? And then maybe when you launch, I always think like, obviously it would be better to launch with a lab, but sometimes...
72 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000651996090
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000771217778