Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

**Erik Torenberg** (0:00)
Hello, and welcome back to The Cognitive Revolution. Today, I'm excited to share a conversation with Ali Behrouz, grad student at Cornell, researcher at Google, and author of Nested Learning. This episode was recorded a few months back, and while I normally believe that AI content does not age well, this conversation with Ali is an exception. His work is some of the most inspired and potentially transformative that I've seen anywhere in the quest for new machine learning architectures that are capable of genuine continual learning. This, of course, is one of the most important capability advances on the horizon today. Arguably, it is the main gap between today's models and a digital AGI that would be capable of joining and contributing to human teams just as humans do. And Ali is advancing the frontier with an approach that is both biologically inspired and technically elegant. His blockbuster paper, Nested Learning, which has been touted as a harbinger of possible paradigm shift by no less than Jeff Dean, develops a simple strategy that allows models to rapidly adapt to their current context on an ongoing basis while preserving core knowledge by updating different parts of the system at different frequencies. Much like humans manage memory on multiple time scales from working memory to long-term memory. His latest work, Language Models Need Sleep, Learning to Self-Modify and Consolidate Memories, which I actually heard about live for the first time on this recording and which has now finally become fully public, takes inspiration from how humans consolidate memories and learn from dreams while sleeping, introducing a new offline mode in which models transfer new knowledge from their high frequency update layers to their more slowly evolving layers via distillation and also learn new abstractions and connections between concepts by generating and training on synthetic data derived from their recent experiences.
In addition to the details of these architectures, which like so many AI innovations, I find both extremely exciting and a bit scary, we also discuss how scaling for performance may shift from stacking more layers to nesting more frequency update rates. How Ali understands all components of machine learning systems as forms of associative memory that compress a given context flow. Why this leads him to call deep learning architectures an illusion, and how he's operationalized this conceptual insight by developing expressive optimizers that learn update rules and are capable of outperforming both Adam and Muon. We also discuss how the attention mechanism can be understood as an infinite frequency update module, and why Ali expects that attention layers will therefore remain fixtures of AI systems indefinitely. We covered the empirical results, showing that Ali's new architectures compete effectively with transformers on standard measures while also outperforming them on hard tasks, such as effectively recalling information from up to 10 million tokens of context, and also learning to translate multiple previously unseen languages at the same time.
Finally, we discuss why Ali sees continual learning as both an opportunity and a huge risk for privacy and alignment. How human-AI relationships might evolve, and why Ali is cautiously optimistic that models that evolve over time, based on our interactions with them, could both serve our individual needs more effectively, and also lead to a more diverse and hopefully stable AI ecosystem overall.
The bottom line for me is that for all the debate and speculation about whether or not current architectures can scale to AGI and beyond, there is a very good chance that conceptual breakthroughs will render that question moot before we even manage to answer it. Transformers have changed the world clearly, but they aren't the end of history. And as tough as it is to keep up with AI developments, anyone who wants to get a handle on where things are going from here, can't afford blind spots when it comes to new research directions like Ali's. And so without further ado, I hope you enjoy this deep dive preview of AI systems that learn on an ongoing basis in increasingly human-like ways, with the brilliant Ali Behrouz.
The Cognitive Revolution is brought to you by Mercury, the fintech that more than 300,000 ambitious companies and individuals trust to run their finances. Over the last few months, I have made tremendous strides with my personal AI infrastructure. Today I've got high context instances of both Claude Code and OpenClaw running on a Mac Mini, and it's amazing what they can do. However, until getting started with Mercury, I didn't have a great way for them to pay for things. I didn't want to give them unrestricted access to my money, but my old bank didn't give me any other options. With Mercury, I can create as many virtual cards as I want, each with its own daily, weekly, or monthly spending limit, and I can lock any card to a single category of purchase or even a single merchant. Now, I have a card that my agent can use to buy our family's groceries, and only our groceries, and I can create another anytime I want to give an agent a random one-off project that might require making a purchase. This is honestly just the start of Mercury's AI-friendly offerings. Does your bank offer API keys, an MCP, or a CLI tool? If not, check out Mercury at mercury.com. Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column NA, members FDIC.
Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Feed this to your agent