#94 – Ilya Sutskever: Deep Learning Transcript — Lex Fridman Podcast

**Lex Fridman** (0:00)
The following is a conversation with Ilya Sutskever, co-founder and chief scientist of OpenAI, one of the most cited computer scientists in history with over 165,000 citations, and to me, one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence and life in general than Ilya, on and off the mic. This was an honor and a pleasure. This conversation was recorded before the outbreak of the pandemic. For everyone feeling the medical, psychological and financial burden of this crisis, I'm sending love your way. Stay strong. We're in this together. We'll beat this thing. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with five stars on Apple Podcast, support it on Patreon or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N. As usual, I'll do a few minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App, the number one finance app in the app store. When you get it, use code LexPodcast. Cash App lets you send money to friends, buy Bitcoin, invest in the stock market with as little as $1. Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend Ascent of Money as a great book on this history. Both the book and audiobook are great. Debits and credits on Ledgers started around 30,000 years ago. The US dollar created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to, just might, redefine the nature of money. So again, if you get Cash App from the App Store or Google Play and use the code LexPodcast, you get $10, and Cash App will also donate $10 to First, an organization that is helping advance robotics and STEM education for young people around the world. And now, here's my conversation with Ilya Sutskever.
You were one of the three authors with Alex Kaczewski, Jeff Hinton, of the famed AlexNet paper that is arguably the paper that marked the big catalytic moment that launched the deep learning revolution. At that time, take us back to that time, what was your intuition about neural networks, about the representational power of neural networks? And maybe you could mention, how did that evolve over the next few years, up to today, over the 10 years?

**Ilya Sutskever** (3:10)
Yeah, I can answer that question. At some point in about 2010 or 2011, I connected two facts in my mind.
Basically, the realization was this. At some point, we realized that we can train very large, I shouldn't say very, you know, tiny by today's standards, but large and deep neural networks end to end with back propagation. At some point, different people obtained this result. I obtained this result. The first moment in which I realized that deep neural networks are powerful was when James Martens invented the Hessian free optimizer in 2010 And he trained a 10 layer neural network end to end without pre-training from scratch. And when that happened, I thought this is it. because if you can train a big neural network, a big neural network can represent very complicated function. because if you have a neural network with 10 layers, it's as though you allow the human brain to run for some number of milliseconds. Neuron firings are slow, and so in maybe 100 milliseconds, your neurons only fire 10 times. So it's also kind of like 10 layers. And in 100 milliseconds, you can perfectly recognize any object. So I thought, so I already had the idea then that we need to train a very big neural network on lots of supervised data, and then it must succeed because we can find the best neural network. And then there's also theory that if you have more data than parameters, you won't overfit. Today we know that actually this theory is very incomplete and you won't overfit even if you have less data than parameters. But definitely, if you have more data than parameters, you won't overfit.

**Lex Fridman** (4:50)
So the fact that neural networks were heavily overparameterized wasn't discouraging to you? So you were thinking about the theory that the number of parameters, the fact that there's a huge number of parameters is okay? Is it going to be okay?

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000651996090

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000474021606