Hermes + OpenClaw: NEW AI Voice Agent is INSANE! artwork

Hermes + OpenClaw: NEW AI Voice Agent is INSANE!

AI News Today | Julian Goldie Podcast

June 4, 2026

Stop Typing! Build AI Voice Agents with MiniMax M3 & OpenClaw Learn how to build and interact with real-time AI voice agents using the new MiniMax M3 update for OpenClaw and Hermes. This hands-free workflow allows you to automate tasks and research in real-time without ever touching your keyboard.
Speakers: Julian Goldie
**Julian Goldie** (0:00)
With the new MiniMax M3 update, you can now build a voice agent directly into OpenClaw and Hermes. Let me show you exactly how this works. So if we click this, we can start the voice chat and speak to our agent directly. Hey, how's it going?

**SPEAKER_2** (0:11)
Well, ready to help you with whatever you need today.

**Julian Goldie** (0:13)
What do you think we could automate that would be quick and easy related to SEO?

**SPEAKER_2** (0:16)
Weekly rank tracking and broken link checks with a simple script or tool like Screaming Frog.

**Julian Goldie** (0:21)
What's the funniest joke you know?

**SPEAKER_2** (0:22)
You told my computer I needed a break, and it said, no problem, I'll go to sleep mode.

**Julian Goldie** (0:27)
What's the most interesting fact you can tell me right now?

**SPEAKER_2** (0:30)
Octopuses have three hearts, and two of them stop beating when they swim.

**Julian Goldie** (0:33)
Teach me some Swahili.

**SPEAKER_2** (0:35)
Hello means hello, thank means thank you, and Habari asks how you are. Try Jambo, Habari.

**Julian Goldie** (0:40)
So you get the point. Basically, you can build out a voice agent into your Hermes and your OpenClaw. We can do the same inside Hermes here. You can see we just tap to it and then speak, and then there's multiple different voices.

**SPEAKER_2** (0:53)
You're welcome. Let me know if you want to learn more.

**Julian Goldie** (0:55)
Then there's multiple different voices that we can plug in here as well. I don't know why I kept going there, but there we go. Now this is actually based on a question from Paul inside the AI Project Boarding. He was asking, how can we speak to OpenClaw?
How can we have a chat with your AI agents?
How can you speak to them in real time, et cetera? And this is a powerful way to do it. So essentially what we're doing here is using a framework I call Voice Agent Mastery. And this means that you stop typing to AI and you can talk to it. So imagine picking up a phone, saying what you need out loud, and your AI answering back in a real voice, right? So you don't need to type, you don't need to click, you're hands free, and that's a voice agent. And it can run inside an agent operating system like I've just shown you. So you can speak, it hears you, MiniMax thinks, MiniMax talks back, turn after turn, like a real conversation. And so this means number one, you can interact faster with your AI agents. And also like, let's say for example, you're on the go, you don't have time to type, this is a faster way to just go back and forth with them and actually operate them. It's really the difference between like texting your AI, which is, you know, just using your keyboard and then actually calling your AI agent as well. And the reason this works is because basically you're interacting with MiniMax M3. MiniMax M3 is the most agentic model I've used with Hermes agent so far. It's really fun to use, easy, super smooth. It can also do a bunch of other cool stuff. So for example, if we go inside the studio here, you can see that we can generate images, we can generate videos, we can generate voice, etc. And the same inside OpenClaw, we can generate voice, video, we can talk to it, we can use images. You could also use GROK as a voice as well. So for example, if you use GROK and you plug this in too, then you can actually use Twitter search and you can search Twitter in real time, which is pretty cool because if you've got AI agents and you're for example, like researching the latest AI automation news each day, or you could put that on the schedule and get that running directly too. You've got four different voice modes as well. So if we test out the Hermes one here, let's try the deep mode. Hello there, are you listening? Yep, I'm here and listening. Go ahead, what's on your mind? So that's pretty cool. And then also when you're doing this, you can also save voice notes inside your workspace. So for example, if you were generating like podcasts, you wanted a place to store them and come back to them later, or you could have them inside the workspace. And then you can switch between like the studio, the talk mode and the workspace. And that's pretty powerful too.
So the interesting thing here as well is like you could operate this from your phone. You know, you could have Hermes agent inside Telegram, maybe you're at the gym or whatever.

5 more minutes of transcript below

Feed this to your agent

Try it now — copy, paste, done:

curl -H "x-api-key: pt_demo" \
  https://spoken.md/transcripts/1000651996090

Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.

From $0.10 per transcript. No subscription. Credits never expire.

Using your own key:

curl -H "x-api-key: YOUR_KEY" \
  https://spoken.md/transcripts/1000771240057