Waymo: The future of autonomous driving with Vincent Vanhoucke

**Vincent Vanhoucke** (0:04)
I could picture a future in which my grandkids ask us, hey, is it true that in your day, we used to drive by hand? So it's entirely possible that we're going towards a future where the experience of driving by hand is no longer the norm, and most of the driving happens automatically.

**Hannah Fry** (0:28)
Welcome back to Google DeepMind the Podcast with me, your host, Professor Hannah Fry. Now, the idea of having autonomous vehicles has been this science fiction dream for decades, and now they are a reality. I join you from the back of a Waymo, a driverless car that is operating in numerous US cities, in San Francisco, where I am, in LA, in Phoenix and Atlanta. And they're very noticeable on the streets. They're these big white cars with lots of sensors on top, and crucially, nobody sitting behind the steering wheel. But getting to this stage, where cars can be out on the roads with passengers without the need for human intervention, and doing it so it's reliable and safe, has been an incredibly complex journey. So today, I get to talk about that with distinguished engineer from Waymo, Vincent Vanhoucke. Welcome to the podcast, Vincent.

**Vincent Vanhoucke** (1:24)
Thanks for having me.

**Hannah Fry** (1:25)
I mean, I know you've worked for Google for a number of years previously for robotics. How does the driverless car problem differ from a more generic robotics problem?

**Vincent Vanhoucke** (1:34)
Well, in some ways, the autonomous driving problem is the simplest robotics problem. You have basically two things you need to do. You have to know if you're going to turn left or right. That's one number. And then you have to know if you're going to accelerate or decelerate. That's two numbers.

**Hannah Fry** (1:51)
Yeah.

**Vincent Vanhoucke** (1:51)
In most robotics problems, you have to predict hundreds of numbers to figure out all the degrees of freedom of your robot. This is the simplest robot that has only two degrees of freedom. But that hides all the complexity of the actual problem. Predicting those two numbers is actually a very deep and hard problem. You have to understand the environment. You have to understand the people that are around you, around the car, how they are going to behave, what the environment is going to look like in the future. You have to predict the rules of the roads, what you're allowed to do, what you're not allowed to do. And the mix of all this makes the problem hard. Conceptually, it is a robotics problem. Those are robots, but they're very social robots.

**Hannah Fry** (2:41)
And they're also embedded in the real world, which I imagine could be quite humbling.

**Vincent Vanhoucke** (2:44)
The real world is extremely challenging to work in. The expectations in a lot of robotic contexts is that you have a robot in an environment that you more or less control, or that you have a reasonable expectation about what the other agents in that environment will do.

**Hannah Fry** (3:03)
Like a factory floor, for example, where you have total control.

**Vincent Vanhoucke** (3:06)
Yeah. And in thermos car contexts, we have to basically understand and mesh with the environments, and be respectful of the people that live in it, blend into the environment as best we can, so that we can serve the public, right? And to enable us to drive and have the freedom to operate.

**Hannah Fry** (3:30)
Okay. So in terms of choosing those two numbers, as you put it, I mean, first, the car has to perceive the world around it before it plans what to do next. So I guess if we start there, then, in terms of the perception, I mean, Waymo has a number of different senses. You've got cameras, LiDAR and radar. What are the benefits of each of those? And perhaps, where do they struggle more as well?

**Vincent Vanhoucke** (3:52)
Yeah, the different sensors have different strength and weaknesses. A camera is basically like your eye, right? You see the world as a human would see, but it gives you maybe less, slightly less information about the depth information until you actually put multiple cameras and then you can reason about depth. In contrast, LiDAR is very good about sensing depth. That's what it does. LiDAR is basically a laser that you shoot out and bounces off of objects and bounces back and gives you an estimate of how far the objects are. They don't see in color, right? So they only give you geometric information. They give you a lot less about the semantics of the scene.

**Hannah Fry** (4:32)
Those lasers also bounce off things quite easily, don't they?

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000735603879

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000735603879