**Claire Vo** (0:00)
Today, I am doing a very strange episode where I'm going to create a video avatar of myself, and in about 15 minutes, get to a full minute long video starring none other than your favorite podcast host, Claire Ho. Let's get to it. This episode is brought to you by Merge. Building an AI product is one thing. The hard part is everything around it. Connecting to the tools your team and customers rely on, letting agents take action with the right permissions, and keeping everything reliable and cost efficient once you're in production. Most teams end up piecing that together themselves. So instead of building the products you actually care about, you get pulled into integrations, permissions, routing, and all the infrastructure underneath. Merge is the infrastructure layer for production AI. It connects to thousands of tools, gives agents secure ways to act inside them, and optimizes model routing and spend, without you building or owning any of it. OpenAI, Dropbox, and Ramp already use Merge to move fast and build AI right. Visit merge.dev slash How I AI to start building for free. This episode of How I AI is going to be an adventure because I'm going to be honest, I'm not 100 percent sure this is going to work. I'm going to return to a product I covered very briefly a couple of weeks ago called Google Flow and the new Gemini Omni video generation model. And I'm going to try really hard to create an AI avatar of myself that we can animate, or I guess, cinematically create using AI. So this is Google Flow and one of the features of Google Flow and the Omni model is you are supposed to be able to create an avatar of yourself. Now, we tried this the day it came out. It did not work, but we're going to give it another call it a try and see if we can get a full-featured avatar of myself that then we can go and build consistent character videos off of. So I'm going to select up here. I'm going to create an avatar. We're going to click get started. I'm going to scan this QR code. I have my phone here. I've done this before, so hopefully it'll be fast. Okay, I'm going to put the mic away just for one second. I'm going to allow access to my camera and we're just going to take some photos. Okay, ready? Start.
17, 81, 49, 20, 25, 22 Okay, now it's having me turn my head.
So I turn my head that way, give me a check mark, turn my head the other way. It's giving me a check mark.
And it says we're done. Now, it said we were done last time we tried this. So we're going to see, it's going to take a couple of minutes, and then we will come back and see if I can actually use this avatar of myself. Okay, so look at this beauty. There's this fisheye lens version of me that is now an avatar. So I supposedly can use this. And let's use it to create a hype video for the How I AI podcast. So I'm going to go in here and say, help me create a storyboard for a hype video.
For the How I AI podcast, I already have a character named Me. We can reference, help me come up with the few scenes that would make this great. This is a podcast by Claire about the best ways to use AI at work and in life! Exclamation mark. Okay, so what I love about Flow or what is pitched to me about Flow is that it's not just a video generation tool, it's actually a whole creative suite. And so ideally, it's going to be able to help me not only animate or video generate this avatar of myself, it's also going to help me actually brainstorm what this overall video could be. And I'm creative, but I'm not video creative, so I'm excited to see what it looks like. So how do you imagine Claire, is she in a modern studio or perhaps a bright area home office? Should it feel high tech and sleek or more grounded and lifestyle focused? And are we going for high energy and fast pace and thoughtful and inspiring? So I'm going to say she is in a dark home office, dark green walls with books about AI and fun posters lighting around. This should be more authentic lifestyle version, but it's high tech and about coding.
Have a hacker vibe to it. Okay, well, a bunch of typos, but we'll see what this does. And what I love about these video models and these new tools, again, usually here on How I AI, we talk about coding, we talk about website generation, we talk about PRDs and work product. But what I really appreciate about these new generative AI models, in particular these multimodal ones, image and video, is it unlocks for me an ability to generate, create something that I would have never been able to do before. So I would have never been able to solo produce a hype video for my podcast. I would have a hard time brainstorming it. I wouldn't know how to frame it. I wouldn't know how to block it. But now I have this AI producer here that can help me with this effort. So let's see what the frames are. It's about seven frames. It's going to be an extreme close up of me typing on a mechanical keyboard, totally on brand. Then there's going to be a wide shot of the office. Then it's going to reveal me in my ergonomic chair. Spoiler alert, I am not actually in an ergonomic chair. I'm going to spin around. That's going to be funny. And it's going to give me a digital heads up display, which is also ridiculous. But let's let it happen. Then it's going to do a very, what I'm presuming to be a very cheesy AI montage, a lifestyle moment, a call to action. Going to hit you with the podcast microphone. And then it's going to say How I AI. If this looks good, I'm going to say, this is great. Generate the storyboard. I already have the character at me. And so I'm going to send that. We're going to see what it comes up with. I've noticed that it has a hard time referencing the me character in some early tests. So let's see what it comes up with. I'm presuming it's going to take a couple of minutes. So we will take a mini break and then come back to see what it looks like. Okay, it looks like it's generating a grid for the storyboard. It can't use the avatar. So I think it's going to do it without the character reference. It'll be really interesting to see what it comes up with. But then as soon as it's ready, I'm going to go ahead and generate at least a couple of these storyboard scenes one by one and we can see how well it does with my avatar. Oh, I mean, this is delightful. Look at this glowy mechanical keyboard. Look at how I am hacking on three keyboards. I'm gonna make a little eyes at you with my fake glasses, my very trendy glasses. There's going to be me dragging and dropping a file that probably says like AI.md. I'm gonna smile and then I'm gonna speak into the podcast. This looks great. So what I think I'm gonna do is I'm gonna paste in this first frame of the video that the agent came up with. And instead of saying Claire, I'm just gonna at mention in this avatar that it gave me so that we can see if it generates this video with me as the character. And so I think I've replaced my name here. I've given details on camera, on lighting, on everything. I press enter. Let's see what it creates with my avatar. I have no idea what we're gonna get into, and hopefully it won't be terrifying. Okay, I'm already nervous. What is surprising to me that I didn't actually expect is it does have my posters and my books background here. I guess because they're behind me when I took the photo, it's taking advantage of that. And I'm gonna share my audio as well, and we're gonna see how this video worked. Okay, I got that wrong. I actually generated images instead of videos. Totally messed up, did not click the right thing down here in the bottom right. I had an image generation instead of video generation. So again, I'm gonna paste that walkthrough of the scene here. I'm gonna replace my name with the me avatar. It's gonna have my fingers flying across that mechanical keyboard. It's going to be so cool. I'm gonna go ahead and press send, and we're gonna see how long it takes to generate a video. Now, something you'll notice about every time you generate a video is it used to work like this in Veo 2, so I'm not, Veo 3 as well. So I'm not surprised they do this as they're generating two versions of it. It's gonna take a couple of minutes. The image took a couple seconds. These are probably gonna take a couple of minutes. So we'll come back, and hopefully we will have our first video with Claire's face in it. And while we're waiting, I'm gonna queue up one or two other scenes and see if we can get ones going with my actual face in it, because some of these had the back of my head as opposed to my face. And I think we want to see what my face avatar looks like. So we'll pick frame three and see if we can get that going as well. Okay, the first video generated. Now we have blue nail polish. I still like it. Okay, let's see.
9 more minutes of transcript below
Try it now — copy, paste, done:
curl -H "x-api-key: pt_demo" \
https://spoken.md/transcripts/1000651996090
Works with Claude, ChatGPT, Cursor, and any agent that makes HTTP calls.
From $0.10 per transcript. No subscription. Credits never expire.
Using your own key:
curl -H "x-api-key: YOUR_KEY" \
https://spoken.md/transcripts/1000770958331