A Whistle Stop Tour of AI Creation with Paige Bailey

**Paige Bailey** (0:04)
Human creativity is about to have this explosion of progress, and there's this promise of everyone being able to become a creator, and not just a creator in one specific discipline, but to also be able to expand that out into many other disciplines.

**Hannah Fry** (0:22)
Welcome to Google DeepMind the podcast. I'm Professor Hannah Fry. Now, one of the things that we've always done with this podcast is to bring you access to the people who are working on some of the biggest breakthroughs in AI. And a lot of the time, the researchers, they're talking about techniques and technology that underpins big ideas. But now, we are at a stage where more and more of the tools that we have seen the early iterations of here are now live. They are out there in the world for you to interact with. So what we wanted to do in this episode is just to pause, to look at the array of tools that have been released, talk about how they have changed since we first encountered them, and to explore the myriad ways that they can be used. And if that is our objective for today, well, there is no one better to show us this progress than Paige Bailey, AI Developer Relations Engineering Lead at Google DeepMind. Paige, welcome to the podcast.

**Paige Bailey** (1:17)
Thank you so much for having me. I'm so excited to talk more about what we've been building at Google DeepMind.

**Hannah Fry** (1:23)
The thing is, is that we get to see a lot of the early iterations of this stuff.

**Paige Bailey** (1:27)
Yes.

**Hannah Fry** (1:27)
And last year we had Doug Eck on the show, and he was showing us, I think, the very first iteration of VO.

**Paige Bailey** (1:34)
Yes.

**Hannah Fry** (1:35)
Which now, with the launch of VO3, is quite a different beast.

**Paige Bailey** (1:39)
Exactly. The first implementation of the VO model was still just visual only, not including all of the really enriching sound qualities that we see from the VO3 model. And you also had to give it pretty significant guidance in order to get the model to produce something that looked photorealistic or even like something that you might see in a cinematic film. But we've come a long way. I would be really curious to see what that first video looked like.

**Hannah Fry** (2:03)
Yeah. So I think we have it, actually. Yeah.

**Paige Bailey** (2:05)
Let's do it.

**SPEAKER_3** (2:05)
If I were to describe what you should see, we're going to come in from the top. It's a tracking shot. We're going to come down from the tracking shot. We're going to have this neon hologram of a car driving at the speed of light, cinematic. And then the car leaves the tunnel back into the real world city of Hong Kong. So we should expect a kind of transition back to Hong Kong.

**Hannah Fry** (2:28)
All right. OK. So we're starting off. There's no other way to describe it than what the prompt said. You got these buildings covered in neon lights. It's very smooth, this tracking shot. And then you speed up and zoom in closer and closer. You're in between the buildings now. Oh, and then we have this car racing through the streets. You can see the neon lights reflected in the wet pavement below. There's other cars jostling for position around. And it's almost like everything is blurred because you're just going so fast. But it's really consistent. Now it's gone through a tunnel. There are these big lights overhead. And it's come out of the tunnel into an extremely realistic modern scene.

**SPEAKER_3** (3:11)
It's a wow moment.

**Hannah Fry** (3:12)
I mean, that is really good, isn't it?

**Paige Bailey** (3:14)
It is so good.

**Hannah Fry** (3:15)
So some things that I noticed now, it is quite blurry. And I guess that's sort of part of the vibe that it's going for. But you're not seeing this pristine detail on the car.

**Paige Bailey** (3:25)
Not at all. I think if we also looked really closely, we might also see that some of the physics that's expressing the shots is not quite right. And the way that's kind of the light gets reflected on things is also not necessarily quite consistent. So let's see how well the new VO3 model does for this.

**Hannah Fry** (3:42)
You're using exactly the same prompt here.

**Paige Bailey** (3:43)
Exactly the same prompt. And here we're in the Gemini app generating this video. It can take up to two to three minutes in order to get it to have the right outputs.

**Hannah Fry** (3:52)
Well, let me ask you then, one thing that was really noticeable was the way that Doug's prompt was very poetic.

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000716677378

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000716677378