Reiner Pope of MatX on accelerating AI with transformer-optimized chips

**John** (0:01)
Reiner Pope is the co-founder and CEO of MatX. He's a former MathWiz and Haskell programmer who became a TPU architect for Google. And now he's teamed up with Google's former Chief Chip architect to design a better chip for AI. So, a year ago, everyone was saying, Google is canceled, you know, AI is going to eat their search, no one's going to search for things, and therefore the business, you know, won't do well. Obviously, that sentiment has really shifted, in part, helped by, you know, Gemini 3 is really good, and then also it's really fast, you know, it's powered by the custom chip hardware Google has. You were inside Google for, actually, I think a lot of the foundational period, laying the groundwork for that stuff. What do people not appreciate about what Google did right to lay all the groundwork for their current AI success?

**Reiner Pope** (0:52)
They started with the research, right? The Transformers came from there. Pretty much anyone who's maybe, I don't know, over 30 and had a large lab has been at Google Brain at some point. So, I think there was and has been a lot of talent there. TPUs are pretty good. I mean, we think there's better you can do, of course, but they at least had the opportunity to design the TPUs for neural nets, at least, rather than graphics applications like in video. And so, the overall architecture, starting with single-core, doing what was at the time reasonably large systolic arrays by today's standards, no way near as much. But I think those were a lot of really good decisions.

**John** (1:35)
When did the TPU project start?

**Reiner Pope** (1:37)
TPU v1 was announced in 2016, I think. That was what actually led to the creation of all of those 2016-2017 startups. So, CiriBus, Grok, Graphcore, SambaNova, all of those. TPU v1 actually was, I think, is a really impressive project. It was done on a very short timeline, maybe, I don't know the full details, but maybe about a year or so, maybe a year and a half, with a skeleton team of 20, 30 people. Really, really minimal viable product. More recent TPUs and more recent AI chips in general, like, can't do that because the market has moved and the stakes or the table stakes are much higher. But the first-generation product, they just one big systolic array, stick a memory next to it, we're done. And it was really simple, nice, elegant product.

**John** (2:26)
And obviously, that TPUv1 predates the transformer. Is that just a coincidence that they happened at very similar times or?

**Reiner Pope** (2:34)
Yeah, it was. Yeah, I mean, there was a period of maybe about four years of like a lot of, I mean, a lot of ML research or neural net research prior to transformers. So, what was popular? LSTMs and Confidants and Reset and Inception. The big thinking at the time was to adapt it to be used for LSTMs.
It's a reasonable fit there. But yeah, I mean, I think there was just a huge flurry of activity. I think why did it all happen then and not later is probably just because people stopped publishing. I mean, 2022 was about the time when just Google completely stopped publishing its research. And so all the good papers are from before that as a result.

**John** (3:21)
Right. But is there some hand-wavy story you can tell about parallelization where both transformers and TPUs are about really internalizing the importance of parallelization?

**Reiner Pope** (3:36)
So, I mean, definitely. I put it somewhat on people actually.
So, I mean, it is just true. Hardware is massively parallel. Like, you've got tens of billions, hundreds of billions of transistors on your chip, and it takes like maybe 100 clock cycles to get from one side of the chip to the other. And so you can't like do a sequential computation involving transistors on both sides of the chip. So the hardware is just fundamentally parallel, and you have to take advantage of that. TPU V1 and all later TPUs like naturally took advantage of that. Just MatX multiply is really nice because it is so parallel. So I think on the hardware side, that's generally understood. I think most ML researchers, especially of the time, were not super deep in what hardware wants and what is mechanical sympathies. So I'll turn this to you for that. So I mean.

**John** (4:29)
So what is the term mechanical? I mean, it kind of makes sense.

**Reiner Pope** (4:31)
Yeah, it speaks for itself. It's like, I mean, I think about the poor machine and what does it want?

**John** (4:37)
I wanted it to want.

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000751748607

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000751748607