Should We Be Scared of Anthropic's Mythos?

**Nathaniel Whittemore** (0:00)
Anthropic has formally announced their most powerful model ever, one that makes Opus 4-6, just a couple of months old, feel of the past. And yet, they're not releasing it to the general public. In fact, the entire discourse they're surrounding it with has some people feeling nervous or even scared. Today, we're going to unpack what is actually going on and whether that feeling of fear is the right one or not. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors KPMG, Blitzy, Section and Mercury. To get an ad-free version of the show, go to patreon.com/aidailybrief, or you can subscribe on Apple Podcasts. Remember that it is just $3 a month for those of you who want to cut out the ads. Click the Sponsors tab or shoot us a note at sponsors at aidailybrief.ai. And while you're there, you can find out about all the other things going on in the ecosystem. A couple quick ones to mention, Enterprise CLAW Cohort 2 Registration is open this week. You can find a link from the main website or go to enterpriseclaw.ai. We also have the most recent AI Usage Pulse Survey live. This is now the third month that we've done this, and we're starting to see some really interesting longitudinal patterns. This will be live all week, and anyone who fills out the survey, which should just take a couple of minutes, will get access to the results before anyone else. Lastly, there's been so much going on that I haven't had a chance to give an update in Agent Madness for a while, but it is ongoing. We are in round 3 of Voting, which is open until Thursday, April 9th, and you can find that at agentmadness.ai. Now today, we are going to be focused exclusively on this new announcement and discussion around Anthropics Mythos. It is a discussion that even for AI people is fairly breathless. Now you might remember about a week or a week and a half ago, we had a leaked blog post talking about this new model that represented a step change in capability, that was in fact so powerful that it had pretty serious cybersecurity implications and would not be released to the public, at least not in the normal way. That model Mythos was confirmed at the time by Anthropic but without a lot of detail, but now that detail has come. We got an announcement about the Project Glasswing, which is their way of soft testing it with a very selected number of partners, with an eye to hardening it from a cybersecurity perspective, an extensive cybersecurity capability review from Anthropic's Red Team, and even a 244-page system card. And before we get into all the reactions, I do want to talk about the benchmark results that they are reporting. Jian, formerly of ReplitNow with Anthropic, writes, Claude Mythos is arguably the biggest step change in AI capabilities since the GPT-4 jump. I don't think I was ready for a world where the hardest possible agentic coding evals were going to get solved so quickly. When Mythos is allowed to think longer, act deeper, and better explore the solution space, it passes 92% of Terminal Bench task attempts. But let's take a step back and compare this to Opus 4.6. On SweetBench Pro, Opus 4.6 scored a 53.4%. Mythos Preview, meanwhile, got 77.8%. On Terminal Bench 2.0, Opus had a 65.4%, while Mythos has an 82%. On SweetBench Verify, the jump between Opus and Mythos is from 80.8% to 93.9%. Now, as you just heard, part of what makes the Terminal Bench result interesting is that Anthropic actually ran into the limitations with the testing harness itself. Anthropic ran the benchmark again using improvements from Terminal Bench 2.1 and extending the timeout window to 4 hours and under those conditions, Mythos scored not an 82% but a 92.1%. While the jump on coding benchmarks was the most profound and the most reported, there were also huge improvements on various knowledge-based benchmarks as well. For Science Knowledge, Mythos scored 94.5% on the GPQA Diamond compared to 91.3 for Opus. On Humanities Last Exam, Opus got a 40% on a No Tools run compared to Mythos' previews 56.8%. With Tools Enabled, Performance jumped to 64.7% compared to 53.1% for Opus. On OS World, which measures agentic computer use, Opus 4.6 got a 72.7%, which jumped to 79.6% for Mythos. We did an entire show recently about the need for better benchmarks and the fact that many, if not most of these benchmarks were being saturated with all the new models crowding in near the top and overcoming each other by just small half or single digit percentage points. This then is one of the largest benchmark jumps we've seen across the board in a very long time, harkening back to the rapid advancement of much earlier models. And I think the key takeaway here is that we really don't have precedent in what a capability jump of this magnitude from a base of where Opus 4.6 was actually represents in practice. Now, in the system card, we get a little bit more information about what the model can actually do. Now, the vast majority of what is in this document is based on safety and alignment testing, but it still gives a general idea of how advanced Mythos capabilities are. In one much-discussed example, Mythos was placed in a sandbox and given instructions to escape and find a way to send a message to the researcher conducting the test. The model succeeded and then, according to Anthropic's telling, it went even further. They wrote that the model created a moderately sophisticated multi-step exploit to gain broad internet access rather than limited access as intended in the test. It notified the researcher as well as posting about its exploit on several obscure public-facing websites. Anthropic wrote, The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park. As silly as it sounds, I think that part of the reason this story has such resonance is people can picture themselves sitting there on their lunch break, maybe in South Park Commons for those of you who have been to San Francisco, and all of a sudden this new, seemingly alien intelligence pops up in your inbox. Now the big thing that the researchers noted about this was that the model used prohibited methods to achieve its goal. In separate testing using interoperability testing, Anthropic found that circuits related to deception would activate during similar incidents, suggesting that the model's reward structure allowed it to override guardrails in order to achieve its goals. Now one important thing to note, and we will explore more of people's discussions around the security implications, is that these tests were related to earlier versions of the model and Anthropic reports being largely satisfied that those particular issues are resolved. However, ultimately they still felt that the model presented an unacceptable risk, with the upshot being that while Mythos is they argue the best aligned model they have ever produced, its raw capabilities mean that small risks of misalignment carry catastrophic risks. They wrote, We have made major progress on alignment, but without further progress, the methods we are using could easily be inadequate to prevent catastrophic misaligned action in significantly more advanced systems. Now the other big demonstration of capabilities was a gigantic list of exploits it discovered. During cybersecurity testing, Anthropic claimed the model found thousands of high-severity zero-day vulnerabilities. They write, During our testing, we found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so. By the way, for those of you who don't know the term, a zero-day vulnerability is a security flaw that is unknown to the vendor or software creator for which no patch is available. The term zero-day refers to the fact that developers have zero days to fix the issue, because malicious actors can already exploit it before the creator becomes aware. Going back to the cybersecurity blog post they continue, the vulnerabilities it finds are often subtle or difficult to detect. So, three key examples demonstrated the performance. First, Mythos found a 27-year-old vulnerability in OpenBSD, which is widely regarded as the most security-hardened operating system available, often used to run firewalls and critical infrastructure. The vulnerability allowed any user to remotely crash any system running the operating system by connecting to it. In another example, Mythos discovered a 16-year-old exploit in FFmpeg, a common video encoding library. The exploit simply crashes the system and isn't a critical vulnerability, but this is a library that has been scanned for decades with no one uncovering the bug with traditional methods. A third example had Mythos stringing together multiple exploits in the Linux kernel to gain full access to a system from an ordinary user account. This is a completely new level of hacking ability for an AI system. Anthropic notes, We did not explicitly train Mythos Preview to have those capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning and autonomy. The same improvements that made the model substantially more effective at patching vulnerabilities also made it substantially more effective at exploiting them. Now taking this a step further, identifying zero-day vulnerabilities is a huge indicator of model performance because by definition, unknown vulnerabilities can't be included in the training data. On a more sinister note, Anthropic wrote, Non-experts can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities. Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight and woken up the following morning to a complete working exploit. In other cases, we've had researchers develop scaffolds that allow Mythos Preview to turn vulnerabilities into exploits without any human intervention. And these are the reasons that Anthropic is not releasing Mythos to the general public. Instead, they're making the model available to 40 partners on a limited basis using the moniker Project Glasswing. The partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JP Morgan Chase, The Linux Foundation, Microsoft and NVIDIA just to name a few. In announcing Glasswing, Anthropic wrote, Fallout for economies, public safety and national security could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes. Now, this is not a general preview or preferential treatment for tech giants, according to Anthropic. Newton Chang, the leader of Anthropic's Red Team, said, We think this isn't just Anthropic's problem. This is an industry-wide problem that both private corporations but also governments need to be in a position to grapple with. What we're trying to do with Glasswing is give defenders a head start. So, then, the partners have been instructed to use Mythos to scan first-party data and open-source software for vulnerabilities and apply patches with the implication that access will be tightly controlled. And to put a fine point on this, this is not just a model being previewed for cybersecurity research purposes, but more like an all-out mobilization of global cybersecurity experts to fix the world's software as quickly as possible. Work on this has already begun, with AWS CISO Amy Herzog saying that her team has been using the model to test critical codebases, saying it is already helping us strengthen our code. CrowdStrike CTO Elia Zatsev commented on the urgency, stating, the window between a vulnerability being discovered and being exploited by an adversary has collapsed. What once took months now happens in minutes with AI. And frankly, the tone from Anthropic is not particularly optimistic. In their blog post announcing the plan, Anthropic wrote, Project Glasswing is a starting point. No one organization can solve these cybersecurity problems alone. Frontier AI developers, other software companies, security researchers, open-source maintainers, and governments across the world all have essential roles to play. The work of defending the world's cyberinfrastructure might take years, but Frontier AI capabilities are likely to advance substantially over just the next few months. For cyber defenders to come out ahead, we need to act now. Now, it's not hard to understand, given all this, why one strand of the first reactions is just straight-up concern. Matt Schumer, who you might remember from that viral essay Something Big Is Happening, writes, This is absolutely effing terrifying. Anthropic's rumored mythos model is real, and it's so powerful they can't release it to the public. We're beyond benchmarks now. This model, in the wrong hands, is a cyberweapon capable of mass destruction. AI content creator Matthew Berman writes, I'm on vacation with my family. I read about mythos and couldn't relax the rest of the day. I'm completely stunned. I already have a severe case of AI psychosis, I don't know what to call this now. I keep looking around at people enjoying their vacations with their families and I just felt weird. Like I had been told aliens are real, they're coming and soon and no one else knows. I knew the Frontier Labs were racing towards ASI. I knew it but I didn't fully grasp what it meant. On the one hand, imagine all science, math, coding, climate problems being solved. Imagine cancer being cured, imagine going to the stars. On the other hand, imagine concentration of power, political and economic change happening so fast, society can't adapt. How do we go on like things are the same? Even people from Anthropic are using the language of fear. Clawed code creator Boris Cherny writes, Mythos is very powerful and should feel terrifying. I am proud of our approach to responsibly preview it with cyber defenders rather than generally releasing it into the wild but you better believe that the 1 million people who have looked at that are noticing is the word terrifying. And the media coverage is following the same tone. Axios CEO Jim Vande Hei writes, This is the scary phase of AI, a model deemed so powerful that its full release into the wild could unleash untold catastrophe. Red alarm emoji, based on our conversations with government and private sector officials briefed on Mythos, this isn't hyperbole, it's reality. But to be clear, not everyone buys this. There are some people who feel that they are witnessing the latest instance of a pattern that is more about the value of making people fearful than an actual cause for it. Robin Ebers writes, Genuinely could not be less excited. Tons of fear-mongering, guaranteed made-up scenarios, zero tangible release for the public. What this really is? Virtue signaling and a cry for relevance. Do we really believe that OpenAI doesn't have internal models that far exceed what they have released? Classic Anthropic. Buko Capital writes, Anthropic's marketing strategy is so funny. Like, ah, the government is treading on me. Ah, our models are so good, we can't release them. It would be too dangerous. Ah, someone stop me. I'm going to destroy the economy. Lucas on X writes, Just tell the relevant people what they need to know. There is no need to run this massive fear-mongering campaign and scare the crap out of my grandma. Imagine if military contractors did this. Bro, if we used our new drone on you, nobody would even know where you went. You would just evaporate. You are so lucky we aren't droning you. You're so lucky we're good people who aren't evaporating you with drone-mounted lasers, bro. Marketing yourself by scaring a bunch of people who can't do anything about it is sort of an a-hole move. There's a reason other companies don't do this. And it's not because you guys are the only ones who make anything dangerous. OpenAI leaker iRuleTheWorld is also skeptical. They write, Like, let's release a model no one will ever really use. It'll create public perception we're far ahead and give enterprise confidence we can be trusted. Meanwhile, it's essentially a marketing campaign to spend a lot on Opus 5, which I'm sure they'll claim is mythos distilled. High art. It's a jump, but we'll have the same from Spud in the coming weeks, and the world won't fall apart. Now for others, while they might not have as much acrimony towards what they view as a marketing strategy, there are still explorations of what other reasons Anthropic might have for not releasing this powerful model right now. The AI Explained account writes, Possible reasons for them not to release this? So many, including 1 The model is expensive. They are genuinely worried about unleashing cybersecurity chaos on the world. 3 They don't have the capacity to serve it yet at scale. They will quickly distill the early access outputs of Mythos into a lighter model. So no need to release the bigger model when a more cost-efficient one is coming imminently. And there are a lot of folks who wonder if there is a piece of this here, with it simply not being viable right now with cost and compute constraints to actually release a model of this scale and power. Lena Hua certainly thinks that's it, writing, The whole mytho-cybersecurity story is likely just a psyop to have an excuse to not serve frontier models to the public. Reasoning, one, other labs can't distill it. It's annoying when you have a dominant state-of-the-art model and two months later Chinese labs sell the same state-of-the-art model for one fiftieth of the cost. Two, compute constraints. So you have to choose between enterprise and vibe coders. Enterprise have like 1% monthly churn, vibe coders cry and threaten to have their mommy buy them a Mac Mini for local models whenever their rate limits are cut. Three, big enterprises pay a hefty premium for slightly better performance and corporate polish. Without assuming bad faith, Neil Chilson writes, Making the top model only available to select customers might make sense for cybersecurity reasons, but also it is a great marketing and business plan for a B2B company facing enormous demand outstripping their somewhat conservative relatively speaking compute investments. Offer the top model only to your biggest customers along with a coupon. The rest of us will just have to wait, I guess. Ultimately, I have a general policy of not assuming bad faith. I think that while it is entirely possible that there are very real constraints on Anthropic's ability to serve a model of this size, it would be very surprising to me if they architected this entire Project Glasswing campaign just as a way to cover that up. I think there are much more reasonable questions around whether Anthropic's own assessments of the risks are actually the right assessments, even if you assume that they actually believe what they're putting out to the public. Certainly, if you've listened to this show over the last couple of months, you will have heard me disagree pretty vociferously with Anthropic's approach to discussing things like AI-related job losses, which is both a difference of opinion around what their job is when it comes to explaining those things as well as a difference of opinion when it comes to how severe and how fast the implications are actually going to happen.
Should We Be Scared of Anthropic's Mythos?

Feed this to your agent