Essentials: Machines, Creativity & Love | Dr. Lex Fridman artwork

Essentials: Machines, Creativity & Love | Dr. Lex Fridman

Huberman Lab

May 29, 2025

In this Huberman Lab Essentials episode my guest is Lex Fridman, PhD, a research scientist at the Massachusetts Institute of Technology (MIT), an expert in robotics and host of the Lex Fridman Podcast.
Speakers: Andrew Huberman, Lex Fridman
**Andrew Huberman** (0:00)
Welcome to Huberman Lab Essentials, where we revisit past episodes for the most potent and actionable science-based tools for mental health, physical health, and performance. And now, my conversation with Dr. Lex Fridman.

**Lex Fridman** (0:15)
We meet again.

**Andrew Huberman** (0:16)
We meet again. I have a question that I think is on a lot of people's minds, or ought to be on a lot of people's minds. What is artificial intelligence? And how is it different from things like machine learning and robotics?

**Lex Fridman** (0:32)
So, I think of artificial intelligence first as a big philosophical thing. It's our longing to create other intelligence systems, perhaps systems more powerful than us. At the more narrow level, I think it's also a set of tools that are computational mathematical tools to automate different tasks. And then also, it's our attempt to understand our own mind. So, build systems that exhibit some intelligent behavior in order to understand what is intelligence in our own selves. So, all those things are true. Of course, what AI really means is a community has a set of researchers and engineers. It's a set of tools, a set of computational techniques that allow you to solve various problems. There's a long history that approaches the problem from different perspectives. What's always been throughout one of the threads, one of the communities goes under the flag of machine learning, which is emphasizing in the AI space the task of learning. How do you make a machine that knows very little in the beginning, follow some kind of process, and learns to become better and better in a particular task? What's been most very effective in the recent about 15 years is a set of techniques that fall under the flag of deep learning that utilize neural networks. It's a network of these little basic computational units called neurons, artificial neurons, and they have these architectures have an input and output, they know nothing in the beginning, and they're tasked with learning something interesting.
What that something interesting is usually involves a particular task.
There's a lot of ways to talk about this and break this down. One of them is how much human supervision is required to teach this thing. So supervised learning, this broad category, is the neural network knows nothing in the beginning, and then it's given a bunch of examples in computer vision that will be examples of cats, dogs, cars, traffic signs. And then you're given the image and you're given the ground truth of what's in that image. And when you get a large database of such image examples where you know the truth, the neural network is able to learn by example. That's called supervised learning. The question, there's a lot of fascinating questions within that, which is how do you provide the truth? When you're given an image of a cat, how do you provide to the computer that this image contains a cat? Do you just say the entire image is a picture of a cat? Do you do what's very commonly been done, which is a bounding box? You have a very crude box around the cat's face saying this is a cat. Do you do semantic segmentation? Mind you, this is a 2D image of a cat. So the computer knows nothing about our three-dimensional world. It's just looking at a set of pixels. So semantic segmentation is drawing a nice, very crisp outline around the cat and saying that's a cat. That's really difficult to provide that truth. And one of the fundamental open questions in computer vision is, is that even a good representation of the truth? Now, there's another contrasting set of ideas. Their attention, their overlapping is what's used to be called unsupervised learning, what's commonly now called self-supervised learning, which is trying to get less and less and less human supervision into the task. So self-supervised learning is more, has been very successful in the domain of language model, natural language processing, and now more and more is being successful in computer vision task. And what's the idea there is, let the machine without any ground truth annotation, just look at pictures on the internet or look at text on the internet and try to learn something generalizable about the ideas that are at the core of language or at the core of vision. And based on that, we humans at its best like to call that common sense. So with this, we have this giant base of knowledge on top of which we build more sophisticated knowledge. We have this kind of common sense knowledge. And so the idea with self-supervised learning is to build this common sense knowledge about what are the fundamental visual ideas that make up a cat and a dog and all those kinds of things without ever having human supervision. The dream there is, you just let an AI system that's self-supervised run around the internet for a while, watch YouTube videos for millions and millions of hours, and without any supervision, be primed and ready to actually learn with very few examples once the human is able to show up. We think of children in this way, human children, is your parents only give one or two examples to teach a concept. The dream with self-supervised learning is that would be the same with machines, that they would watch millions of hours of YouTube videos and then come to a human and be able to understand when the human shows them, this is a cat, like remember this is a cat. They will understand that a cat is not just a thing with pointy ears or a cat is a thing that's orange or it's furry, they'll see something more fundamental that we humans might not actually be able to introspect and understand. Like if I asked you what makes a cat versus a dog, you wouldn't probably not be able to answer that, but if I showed you, brought to you a cat and a dog, you'll be able to tell the difference. What are the ideas that your brain uses to make that difference? That's the whole dream of self-supervised learning is it would be able to learn that on its own, that set of common sense knowledge that's able to tell the difference. And then there's like a lot of incredible uses of self-supervised learning, very weirdly called self-play mechanism. That's the mechanism behind the reinforcement learning successes of the systems that want to go at alpha zero, that want to chess.

36 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000710399709

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

Get the full transcript

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000710399709