Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days artwork

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

How I AI

February 11, 2026

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components.
Speakers: Claire Vo
**Claire Vo** (0:04)
Welcome back to How I AI. I'm Claire Vo, Product Leader and AI Obsessive, here on a mission to help you build better with these new tools. Today, we're going to bring you up to date on all the new coding model releases from OpenAI and Anthropic. In case you missed it, OpenAI released last week Codex, their desktop app for AI engineering, the new model GPT-53 Codex, try saying that five times fast, and Anthropic released their response, Opus 4.6 and Opus 4.6 fast. If you're new here, then you don't know, but when these new models come out, I put them through their paces.
I test them side by side on the same task, and I'm going to give you my opinion about where they do well, where they fall apart, and which one goes where in my AI engineering stack. Spoiler alert, I've shipped more code in the last five days than I think I have in the last month. So I think these are pretty fabulous models, but they do have their quirks, they do have their strengths, and sometimes they go off the rails. Let's get to it. This episode is brought to you by Work OS. AI has already changed how we work. Tools are helping teams write better code, analyze customer data, and even handle support tickets automatically. But there's a catch. These tools only work well when they have deep access to company systems. Your copilot needs to see your entire codebase. Your chat bot needs to search across internal docs. And for enterprise buyers, that raises serious security concerns. That's why these apps face intense IT scrutiny from day one. To pass, they need secure authentication, access controls, audit logs, the whole suite of enterprise features. Building all that from scratch, it's a massive lift. That's where Work OS comes in. Work OS gets you drop-in APIs for enterprise features, so your app can become enterprise-ready and scale up market faster. Think of it like Stripe for enterprise features. OpenAI, Perplexity and Cursor are already using Work OS to move faster and meet enterprise demands. Join them and hundreds of other industry leaders at workos.com.
Start building today. Okay, to start, I like to pick a task when I'm evaluating new models that's pretty ambitious, something I definitely wouldn't want to do by hand, and is consistent enough that I can actually compare the pros and cons of each model side by side. And I picked a task that I choose often when comparing these models, which is redesign my marketing site. I think all these models are pretty good at one-shotting kind of a landing page or a marketing page, a simple app. I don't feel like that's a practical evaluation criteria for these new models. I like to take a codebase that's relatively complex or at least established and compare side by side how these models work inside these codebases. So I took my chat PRD homepage marketing site. It's got lots of pages. It's got a blog. It's got the How I AI workflows on there. It's not a simple app, even though it's just kind of like a content front end. And I want to bring it up to my 2026 ambitions, which are all about the enterprise. So while this website looks great, it's cute, it's got nice colors, it's definitely more focused on the kind of PLG individual user workflow. And I want to up level this as we sell more to enterprise customers. So I'm going to have these models do get out and see which one does the better job. And I'm going to test these in order of when they came out. So the first thing that came out in our very busy week last week was Codex. Now, Codex, as I said, is OpenAI's desktop app for coding. And before we get into it, I want to show off some of the things that I think make Codex unique. First of all, Codex is focused around Git primitives. Now, if you don't know or you're not technical, you're a new software engineer, you probably run into some concepts of Git as you've gotten started by coding. But I just want to walk through a couple of things that might be useful for you to know. The first thing is the idea of a Git repository. That is basically a whole codebase that represents an app or a project. Git repositories are represented over here in Codex as projects. You can see I have different repositories here that I'm working on, including my chat PRD website, the WWW website. Then in your repo, you can start working on new types of code. And there are kind of two ways you can take code and make it contained so that when you edit it, it doesn't break your production website. The first way that I use a lot are branches. Branches are little, as we say, branches of your code that you can make changes to commit and then ultimately decide to merge production. There's also the concept of work trees. These are full copies of your code base that you would use or an agent would use to make changes. And one of the benefits of work trees versus branches, and you get many of them going on, on the same time, on your same machine. And so if you're working with a lot of agents, you could give each agent its own work tree to work on, and it could do a lot of work in parallel without running into each other or causing issues. If you want to learn more about work trees, definitely watch our episode with Alex from OpenAI on Codex, the Terminal app, where he goes through how he uses work trees on a daily basis to kick off his agentic work. And then up in the top right, you can see we have a good diff panel. A diff is, again, the difference between what you had and what you have now. You'll see red is code that was removed, green is code that was added. You can see up here the count of line changed, either added or removed. And then you can create pull requests from Codex. Pull requests are kind of a signal to your team that says, this code that I'm working on is ready to be part of the main production branch. Can you pull it in? I'm requesting it, and often that's where your CI CD pipeline, your pre-develop, or your pre-production checks go, and where your team, with their human eyes, tends to look at your code. And you can see here, as I'm talking through this, Codex has put these concepts up front and center. And I think that's because they're trying to appeal to two audiences. One, they're just trying to appeal to, you know, let the tokens go, highly empowered, use all the agents, software engineers that are doing a lot of things at once on their local machine, and need to be able to benefit from these concepts of Git, work trees, local and cloud agents, all that kind of stuff. The second thing is, I think this is actually a really good framework for folks that are less technical to learn the concepts of GIF. I have always said you should invest in the GitHub desktop experience. It is a version of this. It's what I use all the time to manage my work across branches and across files. I could work in the command line tool for GitHub. I just think it's nice to be able to see your changes and really know what's going on. And so Codex has brought some of these visual concepts, UI concepts of Git into the Codex app. So it's nice if you're learning. The second thing that you'll see in Codex that is a little new and unique compared to other apps is the concept of bringing skills up as a first class citizen. So if you are new, skills are sort of a package set of prompts, instructions, reference files, and code that can be called by an agent to kind of consistently execute a task over time. If you want to be like really cheap, it's like a bundle prompt. And you can see here that OpenAI and Codex have given screens a home, and they've given them icons, and they've given them buttons. And I have to say, I love this. If you watched my early episode when skills first came out, I was so exasperated that skills were like a zip file that you had to upload somewhere or put in your repository. This just makes it a much more visual experience to add skills to your code base or to your system and refer to them over time. I also like that OpenAI shipped a bunch of recommended skills that a lot of people could benefit from. You can get your mind wrapped around what skills would benefit your AI work. The final thing that I think OpenAI put front and center in Codex that's interesting is this concept of automations. Automations are basically tasks that can run on a schedule. You can see here when you create a new automation, you give it a name, you say what project it needs to run on, you basically run a prompt, it's not that fancy, and then you give it a schedule. Again, like skills, OpenAI has shipped a bunch of out-of-the-box automations. Now, my reaction here was, I'm already doing a lot of this stuff. You know, I'm a little ahead of the curve when it comes to some of the automations around my codebase. So I've solved these problems, but I think everybody should solve these problems. So if you're looking for inspiration on what kind of automations would benefit your codebase, the Codex Automations, Recommended Automations is a really good place to start and get some inspiration. But let's get to actually writing code. Now, I have to say one caveat, which is I ran this process using GPT-52 Codex, which was the recommended model when this app came out. Now, very quickly, they came out with 5.3, and we'll see that towards the end of the episode. But I do want to call out this is a slightly older version of the model, though I think the family of models, given my experience, have very similar output. So I would say I would probably get the same experience with 5.3.

19 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000749248436

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000749248436