Sara Hooker - Why US AI Act Compute Thresholds Are Misguided artwork

Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

Machine Learning Street Talk (MLST)

July 18, 2024

Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy.
Speakers: Tim Scarfe, Sara Hooker
**Tim Scarfe** (0:00)
Sara, it's amazing to have you back on MLST.

**Sara Hooker** (0:03)
It's so lovely to be here. It's been a year and a half or something since our last conversation.

**Tim Scarfe** (0:08)
Yes, it has. Yeah, because I think we met at NeurIPS, and then I came and filmed with you in the London office, which is really good. But fans of the show, of course, will know that our first interview was about your hardware lottery paper.

**Sara Hooker** (0:20)
Yeah.

**Tim Scarfe** (0:21)
And that was your first grumpy essay.

**Sara Hooker** (0:23)
That was a very grumpy essay.
You know, I lead Cohere for AI, so it's a research lab, we do a lot of fundamental research. And we, a lot of my work is on efficiency, reliability and building these models that scale the next generation model. So you can go to Cohere for AI and take a look at some of our work.

**Tim Scarfe** (0:42)
Sara Hooker is VP of Research at Cohere, and she leads Cohere for AI, a research lab which seeks to solve complex machine learning problems. Cohere for AI supports fundamental research, which explores the unknown. She leads a team of researchers and engineers working on making large language models more efficient, safe, and grounded.
In this conversation, Sara discusses her recent work on multilingual AI and the challenges of developing language models, which work across many different languages. She provides insights into the limitations of current approaches like RLHF, especially for low resource languages. Sara also talks about her recent paper, critiquing the use of compute thresholds as an AI governance strategy, explaining why simple measures like FLOPs are inadequate for assessing AI capabilities and risks. Sara emphasizes the importance of understanding the relationship between compute, data, and model architectures.
She advocates for a more nuanced approach to AI development and governance, which considers the complexities of language, culture, and the representational long tail where all the low-frequency data lives, which is so often neglected in current models.
Sara's work aims to make AI more globally representative and equitable as these technologies become increasingly integrated into society. Enjoy the show. Your most recent grumpy paper is called On the Limitations of Compute Thresholds as a Governance Strategy. Can you give us the elevator pitch?

**Sara Hooker** (2:23)
So this paper has a very boring title.
And at face value, it's just about this kind of odd, known to not many people in the public, compute thresholds that have actually been widely adopted. They were adopted by the Executive Order on AI. They were adopted by the EU AI Act. And what's fascinating is that these are kind of the key policies that have come out on AI.
Why did I write a paper about this very, very deep topic of compute thresholds? Because it's at the heart of really what our field is asking right now, which is that compute thresholds are based on the idea that models that are future size, so it doesn't apply to models in the world now, are going to trigger some difference in risk profile that deserve scrutiny. And this question of does scale trigger this moment where models have these properties that are fundamentally different from models before that, it is actually very much being at the core of our field for the last two decades. Because in the last two decades, we've had this philosophy of bigger is better. We scale data and we scale model size. So this essay is really about is that true? As we look and stand and look at the last decade, what do we know about the relationship between compute and risk? And what do we think is the feasibility of these compute thresholds actually mitigating risk? And that was the starting point.

**Tim Scarfe** (4:01)
Yeah, so in the beginning, you were talking about how historically we have tried to estimate and control and respond to risk. Can you give us a couple of examples?

**Sara Hooker** (4:11)
Mostly as a society, we have tried to grapple with this idea that we want to proactively control our future for the better. And this is actually recent as well. So it's very typical of modern society that we have this notion of planning and anticipating risks and being able to mitigate. There's examples where me and you do this every day, right? We could put on sunscreen if we're knowing we're going to the sun. We avoid working in dark areas. There's also areas where governments have done this, you know, even in this modern era of the last 300, 400 years.
And it requires two things to do well. One is that you have to understand where risk comes from. So you have to understand what is the kind of lever of risk. A good example of where that's failed is something like the Black Death, where, for example, a lot of the protocols around the time didn't realize that rats were the main vector of the disease. And so because of that, many of the mitigation techniques were unsuccessful. But the second crucial aspect is that once you've identified the lever of risk, you have to form a proportionate response.

56 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000651996090

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000662683676