**Jacob Effron** (0:00)
Is reasoning enough to get to generalization, or is another method needed?
**Lukasz Kaiser** (0:04)
It does feel like there is something else that possibly could generalize much better.
**Jacob Effron** (0:08)
Why do you think Anthropic was the first to be really successful on the coding side?
**Lukasz Kaiser** (0:12)
Anthropic made this very good decision to focus on code. OpenAI was like, we're doing ChatGPT. Hard way Anthropic made this decision was that they just could not compete.
**Jacob Effron** (0:21)
What's your gut intuition on the gap we'll see between closed source and open source models, and whether that widens or shrinks in the next few years?
**SPEAKER_3** (0:26)
I think it's a fair question, but...
**Jacob Effron** (0:28)
Lukasz Kaiser is one of the authors of the Transformer paper, and has had amazing roles at both Google and OpenAI.
On Unsupervised Learning, I got to ask him all the top-of-mind questions of what's happening in AI today. Of course, we had to talk about the Transformer and how he thinks about its persistence and whether it will remain the dominant architecture and what its shortcomings are. We also got his thoughts on what changed in the fall to really make coding models so much better and why Anthropic was really first to code. We talked about what the future research directions that he's really excited about, and we also hit on a bunch of things around how he thinks the ecosystem will evolve from open versus closed source model to application companies. I think folks will really enjoy this episode with a top researcher whose research really set off a lot in the space. Without further ado, here's Lukasz.
It's a pleasure to have a Transformer paper co-author on the podcast. I feel like you've been at the forefront of so many major changes in the AI world. And our goal is really to get your thoughts on all the questions around the AI frontiers today. So I really appreciate you coming on the podcast.
**Lukasz Kaiser** (1:27)
Thank you very much. Thank you for having me.
**Jacob Effron** (1:29)
I can think of no better place to start than generalization, right? It feels like that's the question in the air right now.
And I think in November, I heard you say, basically this big question of, is reasoning enough to get to generalization or is another method needed? And I'm wondering, I guess you said that maybe six months ago now, which is dog years in the AI world, so years ago. How is your thinking on that question evolved since then?
**Lukasz Kaiser** (1:54)
If we take the current transformers with reasoning, and agents, and they have access to a shell and stuff, they can do amazing things, right? It's incredible how far we've gotten, like two years ago even, not to mention before transformers. I would have never believed that you just take this next word predictor, give it then chain of thought and RL that, and tools, and that it will, I know every day spend hours talking to Codex, in my case, or other people, and it works, right? You talk to it about hard problems at work, and it makes sense, and it implements things, and so that's incredible. On the other hand, there is this feeling that it is not quite like us, right? That it's not quite at the edge of what's possible. That we all feel that it possibly should be even better, right? That we can generalize from less data, like somehow make bigger leaps, get these concepts from way less. I recently have this saying that people say like Americans will do the right thing after exhausting all other options, and like LLMs, they will learn a concept. They will learn it, but after exhausting all other options, you need this trillion tokens, you need to learn all the surface level things, and only when that doesn't explain something, they will finally learn the concept. That's not how we learn. We just get concepts from like, sometimes we make them up and they're not great. But so it does feel like there is something else that possibly could generalize much better, that could possibly have this like a bit of a different form of understanding, more like long-term. But it's a feeling, right? And every time we try to put our thumb on it, it seems to evaporate or more like it doesn't even evaporate, but it's like the transformer just catches up, right? It was like...
So both sides in this time have grown, right? Like transformers have gotten even better. But the case for something else has also gotten even better, I would say.
There is now like a number of labs that pursue post-transformers and people see interesting results. There is certainly interesting things out there. So, you know, who wins? I still don't know, to be honest.
57 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000651996090
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000770969496