**SPEAKER_1** (0:00)
Last night, you spent two hours deciding what to wear to the party. This morning, it'll take you two minutes to list it on Depop and make your money back. Just grab your phone, snap a few photos, and we'll take care of the rest. The sheer dress and platform heels you'll never wear again? There's a birthday girl searching for them right now. Your one-and-done look is about to pay for your next night out. Or at least the right home. Your style can make you cash. Start selling on Depop, where taste recognizes taste.
**Brian Keating** (0:31)
The trillion-dollar AI labs have models right now that they will never ever release to the public. And the man who built Stable Diffusion just told me why.
**Emad Mostaque** (0:41)
Because all these labs are going to move to making the discoveries themselves, hiring the smartest humans. The AI model started diverting part of its model training budget to mine crypto. Like Opus, for example, the new code model, when you set it to full autonomy, it would actually write emails to the FBI, saying, my human is trying to kill everyone. Humans will have negative cognitive value on those teams. And the way that models are going right now, if you have something truly novel, for example, in Claude, it resists a bit. It says, it can't be true. Then the RLHF step, the Reinforcement Learning with Human Feedback, that's what really kills the creativity. You know, like you go from liberal arts to an accountant.
**Brian Keating** (1:23)
Now, Emad actually wrote about this exact problem in his new book, The Last Economy. And the argument gets even more interesting when you see the map.
**Emad Mostaque** (1:33)
There are various ways in order to take advantage of the GPUs that we've seen. And the GPUs kind of emerged out of gaming and then oddly crypto. And then they were very suited for the types of matrix multiplications that were suited for these particular types of equations. One big branch is the autoregressive transformers. The other big branch was this diffusion technology, whereby from an equation you start with like a picture for example, or a video of a self-driving, a video of a car driving, or even now code. And then you add noise and you destroy it down to its minimum viable element, and then you reconstruct it and you learn that principle of reconstruction. Now that's kind of everywhere because it's an analogy to the principle of least action. How do you figure out how to take the least action? Most cognition is actually least action. Like the biggest experts you know, it's not like they take hours doing stuff. You know because you ask them and like boom, they compress. They compress. Intelligence is compression. And so we find these kind of diffusion processes everywhere, from gases to societies even. And it comes down to again, the minimization of loss of creating an internal model versus an external model. In AI, one of the biggest thing is what we call the loss curves. How close are you approximating an external benchmark? You see it kind of go down like that. And hopefully not that. The model gets closer and closer to its initial target by basically running these processes at mass scale. And the example I give of this is, some of the listeners might be familiar with 80,000 hours to mastery. It's the same thing. AI model pre-training is 80,000 hours to mastery. And that's what you use these giant supercomputers to do, figuring out the principle-based approach to that. Now, again, you can do that with an autoregressive transformer, which is guessing the next word, and that works one way. But it has some gaps, because you find all sorts of interesting things there. What you see mostly in nature is you see Schrodinger bridges, diffusion processes, optimal transport. What's the shortest route between A and B, if you can represent it correctly? And we found that works incredibly well for images, better than we ever thought it could. And then music, and then video, and then 3D.
And the internal representation of the data going in and then being transformed by these multiplications, figuring out the shortest path between A and B, suddenly started mapping like physics and all sorts of other stuff. But the first part was stable diffusion, a 2 gigabyte file that you push words in one way, and then entire images just came out on consumer GPUs.
**Brian Keating** (4:09)
And it was open source.
**Emad Mostaque** (4:10)
And it was open source because we saw that Open AI, for example, had DALY2, a wonderful image generator based on similar principles that were discovered by a whole bunch of our team members and because we open sourced everything. But there were no Ukrainians or Ukrainian content on it. We were like, that's not good. What if the future is just models, but then you can be cut off from that because these are trained on our collective because they're being trained on the whole internet at the point and we built some of the best datasets to release them open. But then it's privatized, so you don't have the ability to turn your thoughts into images, into sound, into text. Let's push that. And also because, like, holy crap, it fits on a consumer GPU. This is magic.
90 more minutes of transcript below
Try it now β copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000651996090
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000761146371