Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery

**Sam Charrington** (0:00)
I'd like to thank our friends at Capital One for sponsoring today's episode. Capital One's tech team isn't just talking about multi-agentic AI, they already deployed one. It's called Chat Concierge and it's simplifying car shopping. Using self-reflection and layered reasoning with live API checks, it doesn't just help buyers find a car they love, it helps schedule a test drive, get pre-approved for financing, and estimate trade-in value. Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One.

**Amar** (0:34)
This podcast is sponsored by Google. Hey folks, I'm Amar, product and design lead at Google DeepMind. We just launched a revamped vibe coding experience in AI Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever. Just describe your app, and Gemini will automatically wire up the right models and APIs for you. And if you need a spark, hit I'm feeling lucky, and we'll help you get started. Head to ai.studio slash build to create your first app.

**Aakanksha Chowdhery** (1:05)
For the longest time, we were measuring pre-training on static benchmarks. If we want these models to be useful as agents, they need to be able to interact with environments. And when we start caring about those agentic tasks, pre-training needs to rethink from fundamentals. This is not just a post-training problem to achieve these set of capabilities that we want in the next generation of models. And the kind of benchmarks we need for measuring this kind of intelligence is sometimes not available today.

**Sam Charrington** (1:48)
All right, everyone, welcome to another episode of The TWIML AI Podcast. I'm your host, Sam Charrington. Today, I'm joined by Aakanksha Chowdhery. Aakanksha is a member of technical staff at Reflection. Before we get going, be sure to hit that subscribe button wherever you're listening to today's show. Aakanksha, welcome to the podcast.

**Aakanksha Chowdhery** (2:07)
Thank you, Sam.

**Sam Charrington** (2:08)
You have a really interesting background. You've trained some of the earliest large language models, including Palm and Gemini 1.0, Gemini 1.5. Tell us a little bit about those experiences.

**Aakanksha Chowdhery** (2:24)
I got into large language models at Google while building one of the distributed systems that led to the training of Palm, which was our largest language model of the time. It had 540 billion parameters, and people stopped publishing the number of parameters after that. I wonder why. And that led me to be at the forefront of pre-training, solving one set of problems after the other that come with scale, in first two generations of Palm models, and then the first two generations of Gemini models. And I think the one thing that you learn when you do pre-training is that at skill, every problem magnifies and things go wrong at every possible part of the stack. So it's always fun and it's always exciting.

**Sam Charrington** (3:09)
And you said you were on the infrastructure side?

**Aakanksha Chowdhery** (3:11)
I work on the ML side, but I've done both. One thing that is super interesting about pre-training is that you have to be able to think across the stack. So otherwise, if you're going to train something for two or two and a half months, you need to be able to think about every single part of the system.

**Sam Charrington** (3:31)
Nice. And so tell us about Reflection. What is Reflection focused on?

**Aakanksha Chowdhery** (3:35)
So the mission of Reflection is to build Frontier Open Intelligence for agentic capabilities. And the company has been focused on building post-training stack for agentic tasks. And with the most recent fundraisers, we are doing training end-to-end. So we're building the Frontier Open Agentic Models, which are both pre-trained and post-trained in-house.

**Sam Charrington** (3:59)
And that's really going to be a focus of what we're talking about today. Some of the reasons why you think pre-training, a different approach to pre-training is key. Is that kind of the way you think about it?

**Aakanksha Chowdhery** (4:13)
Well, the way I really put it is that for the longest time, we were measuring pre-training on static benchmarks.
For example, Elior is a popular one, or GSM 8K, or math Olympiad problems like Amy and Matt, or you name it, but extremely static benchmarks. But they want these models to be useful as agents. They need to be able to interact with environments and be useful to us in the workflows where we can use them. The simplest version of that that we are already starting to see today are coding agents and then deep research agents. The coding agents are extremely useful in the sense that you can put a coding agent to help you understand a large code base, or they can help you, for example, to refactor and apply a fix across multiple files. They might not do it correctly. They're not perfect yet. Or in deep research agents, I think, we moved away from just a search bar interface to more like, here is what you want and then putting a language model on the job of finding multiple articles that investigate that kind of workflows. And you can imagine that any such goal-oriented task can be given to the language models and the models can do achieve these goals over multiple steps, as opposed to just being chatbots. So those are the kind of agentic tasks that we start caring about. And when we start caring about those agentic tasks, pre-training needs to rethink from fundamentals. This is not just a post-training problem to achieve these set of capabilities that we want in the next generation of models. And the kind of benchmarks we need for measuring this kind of intelligence is sometimes not available today. So I'm happy to talk about that as well.
Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery

Feed this to your agent