4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro

**Nathaniel Whittemore** (0:00)
Today on the AI Daily Brief, OpenAI has released a new image generation model. We are gathering all of the first responses as well as talking about four areas where I think you may prefer it even over Nano Banana Pro. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in. A little teaser here. As you know, we've got the early readout results of the AI ROI Benchmarking Survey coming. And just in general, if you're interested in that sort of data and research, might I point you to aidbintel.com.
We're gonna have a lot more interesting things in these domains coming next year, and you can sign up to get notified as we share more of that information. Now, one more note before we dive in, as is usually the case with a new model release, this episode got long and consumed the entirety of the space that we have. At this point, we're getting due for an extended headline, so that will be coming soon, but for now, enjoy this look at ChatGPT images. Another day, another new model. Look, this competition between the big labs may be stressful for the people working there, but for us consumers, it means nothing but more choice. Today, we are talking about OpenAI's latest image generation model and the new house they put it in, which they are calling ChatGPT images. Now, overall, this is one that I kind of expected. You might remember in the December prediction episode, even before I think Sam Altman had declared Code Red, or at least before we knew about it, my best guess for a response to Gemini 3 and Nano Banana Pro was an OpenAI image model. It had just been a really long time since we got an update on that. It was clearly an area where they were pretty far behind. And it seemed like based on the fact that it had been so long since we got an update and knowing the speed at which OpenAI delivers, they had to be pretty close, one would think, to being able to release a new model. Now I didn't expect a full 5.2 release, and that's obviously the first output of Code Red. But yesterday, on Tuesday, OpenAI dropped their new ChatGPT images. As benefits, they point to stronger instruction following, precise editing, detail preservation, and a big speed boost as compared to before. So let's talk a little bit more about what OpenAI points to as the benefits here. A lot of this is about feature parity with Nano Banana Pro. Remember, the real value of Nano Banana Pro was not just that it was an improvement in terms of raw generation capability. It was about the controls that the user had over it. Whereas in the past, to get exactly what you wanted out of a generation, you'd just have to kind of prompt it over and over and over again and pick the one that was closest, Nano Banana Pro allowed for more precise edits. That capability has now come to ChatGPT images as well. They write, the model adheres to your intent more reliably down to the small details, changing only what you ask for while keeping elements like lighting, composition, and people's appearance consistent across inputs, outputs, and subsequent edits. Interestingly, they pointed some pretty consumer-centric use cases for that, which is a theme we'll come back to throughout this episode. They continue, this unlocks results that match your intent, more believable clothing and hairstyle try-ons, alongside stylistic filters and conceptual transformations that retain the essence of the original image. Another capability they point to is adding, subtracting, combining, blending, and transposing. For example, taking a set of inputs and turning it into a single composition. Another capability that they're really hammering is what they call creative transformations. Basically taking one image and turning it into a different style preset, a movie poster, or turning someone into an 80s fitness instructor, taking someone's photo and turning it into an ornament, etc. Once again, and as we'll come back to, I actually think that they're highlighting this says a lot about who they are intending this product for. Other benefits they point to include better instruction following, up to and including much more precise prompting, and they also point to much better text rendering. Now, this was obviously one of the biggest changes that we got with Nano Banana is that in addition to just being able to have text with Nano Banana and then Nano Banana Pro, you could get a ton of high fidelity text, opening up new possibilities for things like infographics. One final interesting thing from their announcement post is that while in most areas the model improved, they actually did find some regressions as well. For example, they write, the ability to generate some specific art styles has regressed from the previous version. The example they give is draw me like I'm in a dark fantasy anime, with the new version completely 100% not being that at all. There are other limitations as well. For example, when there's a picture with a lot of different faces in it, keeping all those faces consistent between generations can be difficult. Overall, they claim a big improvement but still a lot more opportunity ahead. So what were people's first impressions? I think my sense is that people were kind of prepared to be somewhat underwhelmed. I'm not exactly sure what the reason for that is. Maybe it's a concern that because this was part of that code red, that this and basically any other model that they might release would be a rush job. But for a lot of people, even though they were prepared to be underwhelmed, they were, I would put it, kind of whelmed. Justine Moore from A16Z writes, In early tests, this is a big step up in maintaining consistency of characters and objects from uploaded images. In other words, your face still looks like you. It may be a real competitor to Nano Banana Pro. Simon Smith from Click Health wrote, I wasn't expecting OpenAI's new image generator to be comparable to Nano Banana Pro, so I ran it head-to-head on prompts I tried with NBP. Surprisingly, it did as well or better. But it has a different personality, at least via ChatGPT. Less whimsical, more professional. So here are a couple of the examples he gave. Research when prominent people, especially the leaders of big AI labs and forecasters think we'll get AGI. Then illustrate this on a timeline and put the faces of the people on the timeline on the years when they think we'll have AGI. Give this a fun kind of cartoony but not too silly feel. Now a couple of things. First, I think this is a good test to see how well integrated with the rest of the model image generation is. In other words, this requires not just image generation, but it's also reason and research. And the second thing that this brings up is that inherently the challenge with all of this episode, and by the way, this is a good one to watch if you're just listening, is that to some extent quality is going to be subjective. Although in this case, I certainly see why he prefers ChatGPT images version as opposed to Nano Banana. He tried creating a cell cutout diagram, which again is a little bit in the eye of the beholder, but certainly holds its own, alongside a skeleton anatomy chart and a prompt that said, search up today's top headlines and then give them to me in the style of an old newspaper. Now the two models in this case took the prompt in very different directions, and I actually prefer aesthetically, Nano Banana Pro's, but overall Simon says, I was prepared to be disappointed and I'm not. That's saying something because Nano Banana Pro is amazing. I need more time to play around with the new image generator, but my first impressions are positive. He then came back and said, slides, however, may be a weakness of GPT Image 1.5, before very quickly returning and saying, okay, I take it back. GPT Image 1.5 can do gorgeous slides. You just need to prompt it. I gave it the same template in the above example, but use GPT 5.2 thinking instead of instant and a broader prompt. He did point out, however, that there are real limitations to the aspect ratios that you can get with GPT Image, which has always been an issue for ChatGPT images. Still, all of this added up for Simon to him actually thinking that GPT Image 1.5 has beaten Nano Banana Pro on his personal scorecard. And it wasn't just Simon. Alamarena tweets, ImageArena shakeup! OpenAI's GPT Image 1.5 is number one in text-to-image. ChatGPT Image Latest is number one on image edit. GPT Image 1.5 holds a commanding 29-point lead on text-to-image while maintaining a narrow 3-point edge over Nano Banana Pro on image edit. Now, they do say that these scores are preliminary and we'll see where they settle, but still I think this would surprise a lot of people. Artificial Analysis found something similar. They wrote, on both text-to-image and image editing, GPT Image 1.5 again surpassed Nano Banana Pro on their tests. They gave a couple of different text-to-image generation examples, a couple of editing examples like changing a car's color and inserting a family of ducks crossing a railroad, ultimately again ranking it number one. Now, there are a million examples out there if you want to go see direct head-to-heads on ChatGPT versus Nano Banana Pro. And my strong suspicion is that if you don't have a particular horse in the race or a set of biases that you're bringing in to start, you're likely to find some where you prefer ChatGPT and some where you prefer Nano Banana Pro. For myself, outside of just exploring a bunch of things that I thought were interesting, I ran a couple of tests. For instruction following with multiple constraints, I asked for one person standing and pointing at a screen, two people are seated. The screen shows abstract charts with no readable text, the room is modern and minimalist, the color palette is black, white and light gray only, no windows, no plants, no logos. In that case, both Nano Banana Pro and GPT Images were able to do it equally competently. On a test of photorealism, I asked for a photorealistic image of a hand holding a clear glass coffee mug filled halfway with black coffee. The hand has to have all five fingers and have them all visible. The glass has to show realistic reflections and refraction. The coffee surface needs to be flat and level, natural indoor lighting and a neutral background. Again, in both cases, the models were pretty equally competent. Getting into more stylistic and aesthetics, I asked for a 1950s retro-futurist style illustration with flat, bold shapes, a limited color palette of teal cream and muted orange, clean lines and an optimistic mid-century modern aesthetic. Once again, they were both competent and ultimately, the preference here is going to be in the eye of the beholder. One of the challenges that this shows is that a single stylistic prompt can mean different things. These are both examples of 1950s retro-futurism, but one is a little more Jetsons and the other is a little more abstract. When we created a character and then put them in a different setting, both models had no problem keeping consistent from one to the next. And of course, on YouTube thumbnails, a very common use case for me, frankly, they were both pretty garbo, although I know for a fact that I could improve that with different prompting. As you can probably tell across my tests then, what I found was pretty meaningful parity, not necessarily a clear or huge improvement over Nano Banana Pro, but clearly a huge improvement from where OpenAI's Image Generation Model was before this. However, it's not hard to find people who feel the opposite if you go check out Twitter slash X. There were many people who were just kind of generically underwhelmed. AI News by SmallAI said, shipping anything is hard, so we rarely call out misses, and OpenAI rarely misses, but this was clearly a miss. OpenAI Image 1.5 claims to beat Nano Banana Pro number one across all arenas, but completely fails vibe checks. The Ahiko did a test and found that character face accuracy was kind of lacking. Brand designer Darius Hrkova gave a base input image as well as a product package and asked both models to make the girl in the input image hold the bottle and said while it's better than before, ChatGPT didn't get the scale and changed the product and the light, and if I ask it to make some edits, it reworks the whole image. We'll keep testing, but for now it's one to zero for Google. David Shapiro provided a bunch of images of himself and asked both models to create a YouTube thumbnail, which in this case, undeniably, Nano Banana smashed compared to ChatGPT. Some people were even quite flabbergasted with the arena and artificial analysis results. IamEmily2050 reshared artificial analysis's post and said, What a joke. I'm not going into the conspiracy side, but this is really not looking good for artificial analysis. When someone said, How? That can't be right, Emily responds, OpenAI gamed the benchmarks or paid them to say so. Which, hold aside the substance of that argument, I think reflects people's skepticism. The X comments on both Artificial Analysis's post and the Elamarena post also show just tons of skepticism. So what to make of all of this? I think Peter Gostev from Elamarena is directionally correct when he writes, My anecdotal impression of GPT 1.5 vs. Nano Banana Pro is that they are pretty neck and neck overall. I find GPT a lot easier to prompt. With Nano Banana, you often had to iterate several times before getting a good result, while with GPT you typically get what you ask for. But I think Nano Banana has slightly nicer taste, e.g. for infographics, slides, Google has the advantage. I found GPT style quite heavy. With the important point in the part I'm saying directionally correct being the pretty neck and neck overall. Jimmy Apples had an even simpler version of the same statement, Big upgrade over the previous model. It's not as smart as Banana but it's going to be subjective on what you like on style vs. style. Personally, it really hits the image in my head I have for this prompt. Just use what you prefer. I'll be using both. And that is exactly what my overall conclusion is. Before this, Nano Banana was undeniably and very clearly better than anything OpenAI had going on with image generation. Now, it is not so clearly better. At least not in all cases. What that means practically is that for really high quality image generation, on Tuesday morning you had one option and now in a lot of cases you're going to have two. Now one interesting point that Swix made is that we may also be seeing the limits of how far we can go in image generation with current methods. He writes, I think today's Image 1.5 launch illustrates one of the reasons why people are betting so hard on explicit world models. For the next level in realism, we're going to have to teach the models to see the world as we live it, not through occasional snapshots. He pointed to a post on R slash ChatGPT that said the new image gen is nuts. Someone responded, however, yes, but also the details are a little off. Why is one leg bare and the other covered by pants? What kind of car has a vanity table behind the front seat? Where is the passenger seat? Maybe it's covered by her, but a lot of the background and context still seems off. Still, the people at least look human and not like plastic anymore. So as we round out, let's ask, is there anything that image gen, I think, does distinctly better than Nano Banana right now? And while my answer is no, there's no one use case where I thought, just in every test that I tried, image gen crushed Nano Banana Pro or anything like that, there are four areas right now, with a fifth potential bonus area in the future, that I think image gen may be a desirable alternative to what Nano Banana can do. First up, let's talk infographics. One of the incredible things about Nano Banana Pro when it was released, is that all of a sudden this new capability of making infographics from text came online. I'm sure that you have seen a ton of these floating around the internet, and indeed, that ubiquitousness and commonality of style is exactly why I think in some cases, you might want to use ChatGPT images instead of Nano Banana to make your infographics. For the simple reason that they don't look like a Nano Banana infographic, which already has a particular flavor and style that people can spot from a mile away. I dumped in a recent episode transcript to get an infographic based on it, and both models were able to do this, although they each had their own quirks. As it often does, Nano Banana's first iteration gave a bunch of citation references, even though those are completely useless and wasted space on a visual infographic like this, whereas ChatGPT images just had a few little mistakes here and there. For example, in the three biggest barriers to agentic AI section, it only has two barriers. There were also some random spelling mistakes like bigger being spelled B-I-G-E-R. Now, perhaps the better approach than using ChatGPT images is just to try to prompt your way out of the standard look of Nano Banana Pro. But my point here is that you at least now have a competent visual alternative. I might add to this use case things that need really high text fidelity. That was one of the things that OpenAI called out in their announcement post. And I did some tests around that as well. I asked for an over-the-shoulder shot of Abraham Lincoln sitting at his desk writing the Gettysburg address, make the entire address readable. Although in this case, I found both models able to do it. So once again, we're back into stylistic preference area.
4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro

Feed this to your agent