**Eric Siu** (0:00)
I was spending over $5,000 a month on AI agents. I did a full cost audit and cut it to $800 without losing a single capability. We're talking 84% reduction and the same output. The biggest waste was something I never would have caught without actually looking at the numbers. I'm gonna show you exactly what I found, what I cut, and the framework I used to do it.
So clearly you can see on my screen now, the token costs are coming down significantly versus what they were before, right? So the easiest way to see how much you can save with AI is to just ask AI. So this is my OpenClaw instance. I'm working with it in Telegram. And I just asked, I was like, hey, tell me how much money I save for each of these. And like you did with number one, I said, give me some sexy points for each of these, okay? One of the big things that you can do, if you're paying for ChatGPT. So I pay for Claude, I pay for Claude Max, which is 200 bucks a month. So I happen to pay 200 bucks a month for a ChatGPT as well, just so you all know. And so you want to have both pro versions, I think, so then you can take advantage. And guess what? OpenClaw is now owned by OpenAI. With OpenAI, you have the ability to plug in your OpenClaw with OAuth. That means that you're not running on the API token. So I am testing this right now on the different agents that we have. So I have Phoenix and I have Oracle. Phoenix is my autonomous agent, and Oracle is my SEO agent, okay? Just by doing that, I'm saving $1,000 to $1,700 a month. I am now considering putting some of my other agents onto ChatGPT 5.4. So there are trade-offs to all this. I would say ChatGPT 5.4 is better at coding. Now, when it comes to creative thinking or creative writing, Opus is still better at that. Okay, so you have to figure out what works for you. I still have my primary agent, my chief of staff, Alfred, still on Opus, okay? But I'm looking at slowly moving everything over to ChatGPT 5.4, because then I can run on OAuth, then I save a lot more on Tokens. That's a big one. Now, the second thing is, when you are looking at the different models that you're running, we have different Cron jobs running with my OpenClaw. So for business purposes, I might have a Cron job that is running every day or so to check in on new sales leads that we should be reaching out to or deals that we should be reviving or every week. Maybe I'm getting ideas from competitive YouTube channels. I'm packaging. I should be using. There's a lot of these Cron jobs, chronological jobs that you set that you want them to run well. But here's the thing. If you're running a Cron job on one of the latest, most expensive models, that's obviously going to cost you more money. Now, in this case, if I switch from Opus to Sonnet, that's actually saving me $630 a month. That means like $7,500 a year, something like that, right? We went from it costing $2.50 and getting it down to $0.40 per run. So we're talking over time, we're talking about 84% less spend, and that's basically, you're talking about firing expenses, and you want to be looking at getting more efficiency while cutting on your expenses. So let me go a little more into details here and share an example. So with Opus and Sonnet, we actually had a recruiting Cron job that was running, I think, every 30 minutes or so, and it was running on the most expensive model, and that was costing us a lot of money. And this estimated that we had saved at least $1,000 or maybe even $2,000 a month from that, okay? So it's already saving a ton of money. So you have to think about using the right models for the right situations, and that's how you're going to save a lot of money. By the way, if you're watching this right now, okay, we're talking about this, mostly a lot of this is from a business standpoint, but that's how you should be thinking about it. So let me give you kind of a midway takeaway here. I think if you want to save on these tokens, if you're running this for business purposes, you're trying to grow your business, you're trying to build agents to automate a lot of the work away, you should be paying, again, for Chat GPT, the $200 version, you should be paying for Claude, the $200 version. That way, you're able to use their pro versions and get your most bang for your buck. An example here is this. When I use Cursor and Claude Code, when I'm using it to build more deeper product builds, like a dashboard, for example, or maybe I'm looking to build a mobile app, that's a deeper build that requires more engagement from my side. That's where I would be using Claude Code Max, or more so Claude Max, the $200 month, so I could get that $5,000 in value on the tokens there. I could do that. That's why I think it's helpful to actually have both. And then obviously with Gemini, they have the best image models out there with Nano Banana Pro, right? So you have to understand what you're trying to accomplish, and then you can figure out what stack works the best for you. And then I would say every month or so, you should just probably run this. Like this, me trying to save money, me auditing my crons, this is what you should be doing, okay? Run this like clockwork every single month. In fact, take a screenshot of what I have on my screen right now. Just say, go to whatever you're using right now. Maybe you're using Claude Code, okay? Maybe you're using one of the Claude. Say, hey, I want to save money on my stuff right now. I want to save money on my bills. How do I do this? And just let it ask you clarifying questions on how you can do it, and then you just start saving money, okay? It's the concept that matters. Now, on to number three here, Sonnet by default across the fleet. Okay, so this will save us $300 to $500 a month or so. Sonnet is part of Claude. It's a lower version, okay? And Sonnet is already the latest version of Sonnet was as good as the last version of Opus. That's pretty damn good. And these token costs are going to continue to decrease. By the way, if you want to learn how to decrease your costs by maybe 95%, stay till the end. So number four over here, Cron Audit. So every week or so, we have my machine automatically running a Cron Audit, which is it's killing nine dead jobs, it's cutting compaction frequency in half. And so this will save minimal money. And then you also like, I like tracking my savings from a cost savings standpoint. And then I like measuring against the screenshot that I had from the beginning of this video. Overall, I mean, this is being conservative. We're talking about saving three, four grand a month or so, okay? Twenty four to thirty five grand a year. I think we're going to be using more tokens. So I don't know if the savings is going to realize or we're just going to end up buying more tokens, okay? So I actually, I asked one more question over here, just to give you guys a sense. I was like, hey, because I'm going to go record a video right now, guys. Can you give me more sexy numbers we can talk about? So two more worth mentioning for you all. Number six, self-healing Cron Doctor. So this runs four times a day. So it caches broken crons before they burn tokens, retrying and failing. My Cron reliability went from 50 to 85%. Every failed run is wasted money. This is the janitor that pays for itself. You're damn right it does, okay? Browser uses Cloud API instead of local. So we're running LinkedIn, recruiting Apollo polls, web scraping through a Cloud API instead of speeding up local browser automation, pennies for tasks versus running headless Chrome 24-7. Quick break. If you want to run personalized LinkedIn ads and have personalized landing pages to convert your customers at a much higher rate on LinkedIn, check out Carrot. That's K-A-R-R-O-T dot A-I. Carrot allows you to do things that LinkedIn ads does not allow you to do right now. Check it out. There are publicly traded companies, such as SEMrush and Sitecore that are using this right now. And you can use this to get ahead, because if you don't use this, it will take you weeks, if not even a little longer than that, to make these ads. Because again, you cannot do this on LinkedIn. So again, go to www.carrot.ai, and we'll see you on the other side. All right, so if you guys want to save 95% on costs when it comes to AI, you got to look at something like this. Okay. Am I saying you need to buy a Mac studio? Not really, but kind of. So I think you need to have local infrastructure. Okay. What that means is, so I have a Mac mini next to me, and I like, I want to buy an army of these Mac minis. So my open call is running on that locally, because I want it to have its own password vaults. I want it to have its own emails. I want it to be its own being. That's what I want it to be. Okay. When you think about your token costs going higher and higher, it does make sense to start to think about what are some open source models that you could use. Can you use Kimi, which is a Chinese open source model, which is pretty good. And they're all getting really good. And so if you can use Kimi locally, or something else locally, and you have a Mac studio that's powering your infrastructure, that's great. Or you can go to NVIDIA and buy one of their GPUs, whatever it is that you want to do. I just like the design of this. And by the way, these Mac studios, I was trying to buy a bunch of these a few weeks ago. Let me just tell you this. It's not as available as you think it is. So you actually need, as of right now, as of this recording, before the M6 chips come out, the M3 Ultra. Well, the M3 Ultra, you actually need it, which is, by the way, older than the M4, but you need the M3 because you need the unified memory. So guess what? Here's the problem. You can't even, if I click on this, you can get 256 gigabytes of memory, but a few weeks ago, they had 512 gigabytes. You need 512, at least in my opinion, because 512, and I look at these local models, they are memory hogs. And so ideally, maybe you're just buying two of these, which is going to cost you a lot of money. I'd rather just buy a 512 and multiple 512s and having local infrastructure, maybe, you know, ones for, maybe I might have multiple computers for separate computers for different clients. Let's say they're enterprise clients and they're just running on those machines. But if you do that, because you're running on local infrastructure, what you're ultimately paying for is your internet bill on your electricity. That's what you're paying for.
1 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000758281442
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
Get the full transcriptFrom $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000758281442