Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind artwork

Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

April 22, 2024

Google DeepMind researchers Keerthana Gopalakrishnan and Ted Xiao discuss their latest breakthroughs in AI robotics. Including models that enable robots to understand novel objects, learn from human demonstrations, and operate under ethical constraints.
Speakers: Keerthana Gopalakrishnan, Ted Xiao, Nathan Labenz
**Keerthana Gopalakrishnan** (0:00)
People used to think that all the robots are so different, all of their data is so different, and people moved in the direction of thinking that all robots are similar. It's only as different as English and Chinese or something, and the concepts are similar. It's just the manner of expression that's different.

**Ted Xiao** (0:16)
The data sets that resulted, even starting from those criteria, were so, so diverse, right? We have everything from using baby toys all the way to industrial arms, all the way to very dextrous, like cable routing robots. Even with this very limiting assumption, you still get so many different morphologies.

**Keerthana Gopalakrishnan** (0:35)
If you train on VLMs and if you build on top of the knowledge of VLMs, then you can stitch a lot of concepts from the internet along with the emotions that you have in robotics data sets.

**Ted Xiao** (0:46)
You could, under the same initial conditions, just change the prompt a little bit, do some prompt engineering, and actually see qualitatively different behavior from the robot.

**Keerthana Gopalakrishnan** (0:55)
Ultimately, if you want to do a lot of tasks and be useful in environments where humans operate, you kind of need to go as close to a human environment as possible.

**Nathan Labenz** (1:06)
Hello and welcome to The Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host, Erik Torenberg. Hello and welcome back to The Cognitive Revolution. Today I'm thrilled to be speaking with Keerthana Gopalakrishnan and Ted Xiao, researchers at Google DeepMind Robotics who are developing AI systems for general purpose robots. This is Keerthana's second appearance on the show. Just a year ago, fresh off the publications of RT1 and PalmE, she described the state of AI for Robotics as being somewhere between GPT-2 and GPT-3, noting that the lack of internet-scale data was a major barrier to progress.
Since then, Keerthana, Ted and their colleagues at Google DeepMind have published a remarkable flurry of papers, demonstrating new techniques that allow robots to leverage large multimodal models, to control different physical form factors, to learn more efficiently from human examples and instructions, and even to use a prototype robot constitution to guide their behavior. In this conversation, we cover six different papers. RT2, which shows how internet-scale vision language models allow robots to understand and manipulate objects that they have never seen in training. RTX, a collaboration with academic labs across the country that demonstrates how a single model, trained to control a diverse range of robot embodiments, can outperform specialist models trained for individual robots.
RT Trajectory, a project that shows how robots can learn new skills in context from a single human demonstration, represented by a simple line drawing.
AutoRT, a system that scales human oversight of robots, even in previously unseen environments, using a combination of large language models and a robot constitution to power first line ethical and safety checks.
Learning to learn faster, an approach that enables robots to learn more efficiently from human verbal feedback. And finally, Pivot, another project that shows how vision language models can be used to guide robot action, this time with no special fine tuning required.
Progress in robotics is still trailing behind the advances in language and vision. There are still challenges to be overcome before robotics models will have the scale of data, or the sample efficiency needed to achieve reliable general purpose capabilities. And the study of robot safety and alignment is still in its infancy. But nevertheless, I see this rapid fire series of papers as strong evidence that the same core architectures and scaling techniques that have worked so well in other domains will ultimately succeed in robotics as well. The work being done at Google DeepMind is pushing the boundaries of what's possible. Investment in a new generation of robotic startups is heating up. And the pace of progress shows no signs of slowing down.
As always, if you're finding value in the show, please do take a moment to share it with friends. This one would be perfect for anyone who's ever daydreamed of having a robot that could fold their laundry or pick up their kids' toys. I certainly count myself among them. And especially as we are just building the new feed, a review on Apple Podcasts, Spotify, or a comment on YouTube would be much appreciated. Now, here's my conversation with Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind Robotics. Keerthana Gopalakrishnan and Ted Xiao of Google DeepMind Robotics, welcome to The Cognitive Revolution.

72 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000653176030

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

Get the full transcript

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000653176030