Google's NEW Gemma 4 Just Changed AI Forever...

**Julian Goldie** (0:00)
Today, I'm going to run you through Gemma 4 from Google. This new model that just dropped today, it is Gemma 12B. And this is a local free model you can run, and it can plug into all your AI agents. You can see some cool stuff that were actually built with it right here. And what's interesting about this is local, this is free, this is private, and you can plug it into whatever you want. So, for example, you can get it for free, and then you can plug it into something like Olamma, or you can plug it into Claude or Codex or that sort of thing. It's actually designed to be a super lightweight model, so it's not going to be like the most powerful thing you've ever used, just to be 100% honest with you. But you can build stuff with it, as you've seen today, I've already built out games, tools, very visual stuff as well, which is pretty cool.
It seems to be super fast. And also, you can actually get a free API for this, so even if you are not interested in running local models, maybe you don't have the right setup, et cetera, I'm going to show you exactly how you can use Gemma 4 for free as an API, and then you can also plug that into your AI agents or whatever you want to use. Now, this is basically a local model designed to be multi, so this is the new announcement list you just dropped today. This is designed to be a high performance, multi-modal intelligence model directly to your laptop. And it's also designed to be mobile first, so it can be pretty efficient for running on mobile and it's got advanced reasoning. And one of the interesting things about this is performance of benchmarks. So what that means essentially is that despite it being quite a small model, for example, the context window is only 256K and it's a 12B model, despite doing that, it's actually operating at similar benchmarks to models twice its size, which is quite remarkable in itself. So there's five key things that make this model unique. Number one is its novel unified architecture. So it doesn't have multimodal encoders, right? It's actually just in one single LM. Also, it's advanced reasoning performance, so it's pretty good at advanced reasoning. You can use it on a small laptop. So even like a 16 gigabyte VRAM laptop, you can use this on. It's open source, and it's released under an Apache 2 license. And additionally, it comes equipped with multi-token prediction to reduce. Now you can see the benchmarks right here in terms of how it performs versus other models. So this 27B and this Gemma 4B, bear in mind like Gemma 4 itself has been downloaded like 150 million times. So it is a very proven model that a lot of people are using right now, which is pretty cool. And so let me talk you through why it is unique, what it means, how it works, et cetera. So Gemma 4 12B fully explained. Number one, the size of its brain. So 12B, the B means billion. A brain size is counted in terms of how many tiny knobs, I guess you could call it, it has to learn with. So Gemma 4 has 12 billion of them. That's small enough to fit on a 16 gigabyte laptop. There's an even lighter 8 gigabyte version too, but big enough to genuinely be useful.
So you want to think of it as like a compact, but useful rather than oversized AI brain and a brain that you can plug into your AI agents. So let me give an example. We've already been testing it with Hermes Agent, and it created some pretty cool stuff. Like it created some nice websites. As you can see, you could, for example, create these cool tools, these cool visualizations. You can see how it's generated this report here. It created some nice keyword research as well. And you can see it's beautifully designed. You can get quite a lot out of it. If you have the good, if you have good skills in place, you can see how it can do this. And then for actual just general projects in general, you can see some examples of stuff we've created right here. It can create these visual apps, which are really cool.
Created a color palette, a pomodoro timer, a snake game. This was another sort of brick breaker game as well. Additionally, this is a reflex game that generated, that measures your reflexes, wallpaper, which was pretty interesting too. And then we tested it on some general content and responses and that sort of thing. One interesting point here, it is multilingual so it can speak multiple languages too.

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000651996090

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000771198386