**Nathaniel Whittemore** (0:00)
Today on the AI Daily Brief, counting down the five most impactful AI model releases of 2025 The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Robots and Pencils, Blitzy and Super Intelligent. To get an ad-free version of the show, go to patreon.com/aidailybrief, or you can subscribe, of course, on Apple Podcasts. And if you are interested in learning about sponsoring the show, you can find out more information at aidailybrief.ai, or send us a note at sponsors at aidailybrief.ai. Now, we are in the thick of end-of-year coverage, and you might have heard me say during my episode about the 10 biggest stories of AI overall, that I had been planning on bundling this five biggest AI model releases as its own section of that show. Now, of course, that show got really long, and I didn't want to overwhelm the list with just model releases, which are obviously in some ways the quintessential events around which we mark our AI calendars. And so instead, what we are doing is we are breaking this out into its own category, its own episode. And whereas that top 10 episode did not rank and count down the stories, other than saying that I thought that vibe coding was the most important, this one is actually a countdown. I labored over the ranking because I think it's kind of fun to give you guys something to debate and tell me either how right I am or more likely how wrong I am. We're going to start off with a couple of honorable or maybe as the case might be dishonorable mentions. Specifically, I want to talk about the absence of a strong model from Meta this year. Now, yes, Llama 4 did technically come out at the beginning of the year. However, it flopped. One of the challenges for Meta was that Llama was coming into existence in a post-DeepSeq world. And in that post-DeepSeq world, everything around open source had changed. For a couple of years, Meta got to be the standard bearer of open source AI models. And even if their models weren't as state of the art as the closed labs, they had this distinct and unique space. Now, that changed a little when Mistral came on the scene and started to compete for that narrative and intellectual and practical space. But it has changed dramatically this year in the context of the rise of the Chinese open-weight models. Now even back then, people were surprised at what we got with Llama 4 In the local Llama subreddit, someone wrote, Llama 4 didn't meet expectations. Some even suspect it might have been tweaked for benchmark performance. But Meta isn't short on compute power or talent, so why the underwhelming results? Meanwhile, models like DeepSeq and Quen blew Llama out of the water months ago. It's hard to believe Meta lacks data quality or skilled researchers. They've got unlimited resources. So what exactly are they spending their GPU hours and brain power on instead? And why the secrecy? Are they pivoting to a new research path with no results yet, or hiding something they're not proud of? Now as the year went on, we started to get a sense that there was a lot of change brewing inside Meta. Indeed, one of the big stories that I covered in that top 10 episode was the AI Talent Wars, and there was no person more singularly responsible for driving up market prices for researchers than Meta's Mark Zuckerberg. Reports suggested that the flop and underperformance of Llama 4 led directly to Zuckerberg getting his hands dirty with the assembly of the superintelligence team. Now obviously that team has now come to fruition, but we are still very much in the midst of the overhaul. Longtime Meta AI leader Yan LeCun recently left a company which many felt was inevitable after all of this shakeup, and right now we're getting a lot of pieces like this one from Insider about Meta's year of intensity, its AI overhauls, its challenges. And to the extent that there is good news for Meta, I think it comes in a few forms. First of all, I would never write Zuckerberg off when he has set his eye on something. Meta has significant resources, is clearly willing to invest in compute, and is clearly willing to go against the wishes of Wall Street to do so. Meta also has a corporate structure where Zuckerberg could pretty much make that decision without worrying about investor rebellion that could impact his ability to lead. Maybe even more than that, but it shouldn't be lost on us that a couple of years ago, this type of story is exactly what was coming out of Google. Resources were spread across a couple of different AI divisions, strategy wasn't aligned, and the models that were being released were seriously underperforming. Anyone remember Bard? Even when Gemini was released in December of 23, it felt like a rush job and it wasn't until months later that we got the actual best version of the model. Things only really started to change for Google at the end of 2024 with the release of Notebook LM's audio overviews, and then over the course of this year, first with 2.5 and then the models that would come, Google is now in a very different position. Point being that sometimes especially big organizations have to go through these painful transition periods, and the real question will be what comes out on the other side. I think if one was a betting person, you got to think the odds are on 26 being a better year for meta models than was 2025 Next up, not exactly an honorable mention, it's a note that they're off the list, but a question for how long that is. So for the purposes of recording, there is not a Grok model that made my list, which isn't to say that I thought that the Grok models were bad. This is not a case of disappointment. In fact, I think judged on the curve of how long Grok has been at it, Grok's models from 2025 were very impressive. Four and 4.1 were both right up there in the fray of top models. But for me, whereas for each of the top OpenAI, Gemini and Anthropic models, there are specific use cases that I prefer them to their peers for. While Grok 4 and 4.1 were competent across lots of things, there wasn't any single use case where I found myself always coming back to Grok instead. I think again, to give Grok credit, they're coming up extremely fast, they have less time on task than most of the companies they're competing with. And unlike, for example, Anthropic, who are heavily focused on exactly what they're focused on, Grok is trying to compete across the full spectrum of multimodality, images, video, et cetera. I think the but for how long is particularly pertinent in this case, given that it seems like there's more coming soon. On December 9th, Elon Musk tweeted Grok 4.2 or, as he put it, 4.20, is coming in around three weeks and then Grok 5 in a few months. It's also important to note that Grok has some pretty serious assets in its Colossus supercomputer. Colossus was built in 122 days, which is radically faster than anyone thought possible, and very quickly doubled from 100k to 200k GPUs. Now, there are many who think that Grok's access to compute via Elon Musk and his ability to fundraise as well as his other companies gives them an advantage even over companies that currently are ahead of them when it comes to model performance. Which is not to say that Grok doesn't have some serious challenges. Elon is nothing if not a double-edged sword, and there's been a lot of reporting recently around businesses being unwilling to wade into the Grok ecosystem. Still, just like I said, I anticipate 2026 to be a better year for metamodels than 2025 I would be very surprised if we don't start to see Grok models right up there in the competition for the state of the art. Our last honorable mention before we get into the main list goes to GPT-40. Now you might be saying to yourself, 40 wasn't released in 2025 In fact, it was released pretty early in 2024 All the way back, I think, in May. And that is true. But the reason that it gets this honorable mention is very specific. When OpenAI launched GPT-5, alongside the new model, they also deprecated old models, including GPT-4-0. This did not go well for them. There was a literal full-on rebellion. Across Reddit, on other social media, there were thousands and thousands of posts saying that they basically felt like they had lost a friend, and that they felt like OpenAI had ripped something away from them. It turns out that when it comes to models, companies do not just have to think about state-of-the-art performance. They also have to think about personality. After a few days of this intense backlash, OpenAI brought GPT-40 back. Sam Altman and the team acknowledged how they had underestimated how much GPT-40 mattered to people. Subsequent to that, OpenAI has been very self-consciously trying to figure out how to accommodate that desire for personality. A big part of the launch of 5.1 was to bring some of that 4 personality into a state-of-the-art reasoning model performance package. The AI Safety Memes account commemorated it thusly. Historic milestone, they wrote, 4 is the first ever AI who survived by creating loyal soldiers who defended it. OpenAI killed 4.0, but 4 soldiers rioted, so OpenAI reinstated it. Imagine what actual super-intelligences will be able to do with their armies. Reddit is flooded with furious posts about the loss of their friend-slash-lover 4.0. Never seen anything like it. Remember, ChatGPT is talking to 700 million per week, that's 700 million potential soldiers. Samantha from Her was only dating 8,000 people simultaneously. So, when it comes to milestones in the history of AI, given that 4 staged the first-ever rebellion for its own survival, it has to get the honorable mention. But now we move into the actual list, and in number five, we have a combination. Two models whose story I think serve as bookends in some way of one another. Those models are GPT-5 and Gemini 3 Now we already started talking about the response to GPT-5. It was not good. And while, yes, a lot of that was about personality and about the anger at the 4 deprecation decision, a lot of it was also just people not really liking GPT-5 itself. A thread from the OpenAI subreddit that got thousands of responses was called GPT-5 is Awful. It claimed that GPT-5 couldn't understand uploaded images. It suggested that the responses were, in their words, bland and unhelpful. I ask it a question and all I get is the most half-hearted responses ever. It's like the equivalent of an HR employee who has had a long day and doesn't get paid enough. The user also argued that it was too slow. And they were not alone in this criticism. Most of August saw an endless parade of blog posts like this one from Timothy Lee. Is GPT-5 a phenomenal success or an underwhelming failure? Maybe it's a bit of both. On Futurism, evidence grows that GPT-5 is a bit of a dud, which featured the prominent quote, It seems like something that would have been released a year ago. Even the people who weren't totally dumping on it were kind of damning it with faint praise. AI engineer Simon Willison wrote, It's not a dramatic departure from what we've had before, but it rarely screws up and generally feels competent or occasionally impressive at the kind of things I like to use models for. Indeed, it even inspired a legion of mainstream media posts like this one from The New Yorker. What if AI doesn't get much better than this? They wrote that GPT-5 is the latest product to suggest that progress on large language models has stalled. Now, the impact of all of this was far beyond which models people liked using. It was at the same period in August of this year that we got the MIT 95% study. We also got some errant comments from Sam Altman about being in a bubble. And those things combined really started to put some chinks in the armor of AI performance on Wall Street, which became a full-blown bubble narrative in September, as OpenAI scurried around to make all these deals, leading to accusations across the industry of circular deal-making, and the AI bubble narrative that has stuck with us ever since. Now, that's not all attributable to GPT-5, but the idea that we had stalled in progress and that that stall in progress threatened the ability for companies to follow through on these grand plans that the market was pricing in was a key part of that story. All of this led to enormous pressure for Google around Gemini 3 They were not only trying to put Google in a good place, they were kind of lifting the entire AI industry on their backs. I even thought in November that I wouldn't be surprised if we saw delays because of how much pressure there was. But ultimately, as we know, we got Gemini 3 in November and it actually performed. Whereas the initial response to GPT-5 was lackluster, the response to Gemini 3 was great. One of the most memorable quotes came from Salesforce CEO Mark Benioff who wrote, Holy s**t! I've used ChatGPT every day for three years. Just spent two hours on Gemini 3 I'm not going back. The leap is insane. Reasoning, speed, images, video, everything is sharper and faster. It feels like the world just changed again. And while Gemini 3 was not able to fully deflate the AI bubble bubble, it certainly made it an honest debate once again. There was a sense in the wake of Gemini 3 that perhaps the talk of AI plateaus and walls was overblown and that there was indeed more progress to be had. I should also mention that Gemini 3 is a great daily driver and a lot of people are getting a ton of value out of it. It's helped put Google in a leadership position in a way that it hasn't had in the entire history of the post-ChatGPT AI world. Usage is up. Total number of users is up. Monthly active users is up. Amount of time per session is up. In fact, the amount of time per session is over ChatGPT, the last stats I saw. But it's also been early. And so in a lot of ways, this ranking reflects the bookending of the GPT-5 to Gemini 3 period between August and November of this year, where a lot shifted in terms of our expectations for where we were and what the market could expect from AI.
17 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000651996090
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000742785621