The Evolution of Reasoning in Small Language Models with Yejin Choi

**Yejin Choi** (0:00)
Even for open-ended questions, the models are not as diverse as we would have expected, to the point that even when you ask multiple times with higher temperature, it may not be able to vary as much. So there's intramodel homogeneity in the model output, as well as we find intermodel homogeneity, meaning Lama, Chachi PT, and Deep Sick R1, they all have strikingly similar behavior.

**Sam Charrington** (0:45)
All right, everyone, welcome to another episode of The TWIML AI Podcast. I'm your host, Sam Charrington. Today, I'm joined by Yejin Choi. Yejin is professor and senior fellow at Stanford University in the Computer Science Department and Institute for Human-Centered AI, or HAI. Before we get going, be sure to take a moment to hit the subscribe button wherever you're listening to today's show. Yejin, welcome back to the podcast. It's been a while.

**Yejin Choi** (1:11)
Oh yeah, thanks for having me back.

**Sam Charrington** (1:14)
Absolutely, absolutely. I think we last spoke in the fall of 2021, which seems like ages ago in AI years. I would love to kind of jump in and have you bring us up to date on what you've been working on since then. And actually for folks who didn't catch that one, maybe start with a little bit about your background.

**Yejin Choi** (1:37)
The time when I was on your podcast, I was still maybe best known for working on Common Sense. Knowledge and reasoning and back then, I was also working on natural language generation quite a bit.
Of course, since then, a lot has happened. So more recently, I've been excited about reasoning, especially making small language models to reason better. So I'm broadly interested in large language models, the small language models, large reasoning models, the small reasoning models, and then how we could make models align better for pluralistic norms and values.

**Sam Charrington** (2:21)
Nice. Nice. What drives your interest in SLMs? Seems like a lot of the action is in large language models, and we're working hard to get the smaller ones up to the same level of performance. What's your particular interest driven by?

**Yejin Choi** (2:39)
Yeah. The mission really is democratizing general AI, so that it's not just companies who can purchase a lot of GPUs, are able to create LLMs and adapt to LLMs and serve LLMs, but also people like myself and colleagues who are academics, so for example, cannot buy as many GPUs, and is there something really meaningful and fun that we could do, even with a smaller counterpart? And at the end of the day, I believe that fundamentally it should be feasible. It's only that the world has invested so much more into exploring what happens when you scale things up so much.
Whereas if we invested even a fraction of that investment, but just a little bit more, I do think that we can unlock a lot more exciting capabilities out of small language models. Part of my research is also driven by the desire to find really better ways of teaching intelligence to machines. Currently, it's just so data-centric. We can talk about that in more detail later in this podcast, but it's just so data-dependent. And that's pretty much the only way we know how to teach AI about human knowledge and intelligence. But in the future, I don't know whether we will find the solution or not, but as an academics, I feel like we have to give it a try to find an entirely better solution to this that is so much more data efficient and then able to learn so much more with much less data.

**Sam Charrington** (4:43)
When you think about how the space, the industry evolves and your comment about where all the investment has gone, why do you think that is? Do you feel like the investment has just kind of followed quickly what works and without us taking a time to step back and identify all of the opportunities to optimize? Or do you think that there are particular impediments to smaller models that make it inherently more challenging?

**Yejin Choi** (5:22)
There's definitely snowball effect and then ship hurting effect. You see other ships going where and then you want to follow that. Because it's a safe choice, especially when raising funding is relatively...
Raising funding is not as hard as it used to be for AI, therefore that's guaranteed and proven ways of increasing intelligence. So why not? And in fact, I'm not against such effort. It's really interesting to see and watch how much of an intelligence scale can unlock. I appreciate that some people went crazy and find out the frontier of what happens with the scale. Having said that, I do worry about everybody trying the same thing. I think it's very important that we try different ideas, especially, historically, whatever innovation happened with computers or phones, they're always very large at the beginning. And then over the course of the time, people figure out how to make it smaller yet more powerful. So the same thing will definitely happen with the generative AI as well. In fact, already there's a lot of research effort that makes models smaller but more powerful. And I think we can do so much more, so much better, if we put more mind and effort into it.
The Evolution of Reasoning in Small Language Models with Yejin Choi

Feed this to your agent