Spoken vs Deepgram for podcasts

Deepgram is a speech-to-text API. Spoken is a podcast transcript retrieval API. For published podcasts, retrieval is cheaper, faster, and ships with real speaker names.

Last updated June 2026

Use Deepgram when you have your own audio. Use Spoken when you want a published podcast as Markdown. Deepgram's Nova-2 transcribes audio at roughly $0.46 per hour with diarization labels (Speaker 1, Speaker 2). Spoken returns the existing transcript for any published podcast episode for $0.08–$0.15 flat, with real speaker names already attached, no audio file required.

Side-by-side comparison

	Deepgram (Nova-2 + diarization)	Spoken
Category	Speech-to-text API	Transcript retrieval API
Input	Audio file URL or upload	Search query or episode ID
Speaker labels	"Speaker 0", "Speaker 1" (diarization)	Real names: "Lex Fridman", "Sam Altman"
Cost per podcast hour	~$0.46/hr + audio handling + LLM pass for naming	$0.08–$0.15 per episode, names included
Output format	JSON transcript with word timings	Markdown with speaker bold + timestamps
Time to first transcript	Download audio + processing time	Under 30 seconds, single fetch
Streaming/real-time	Yes	No (published podcasts only)
Works on your own audio	Yes (any file)	No
Works on published podcasts	Yes, after you fetch and host the audio	Yes, in one API call

The hidden cost of using Deepgram for podcasts

Deepgram's listed price is per minute of audio. For published podcasts, the real cost includes everything around that:

Audio retrieval — locating the .mp3 file, downloading 50–100 MB per episode
Storage and ingress — your bandwidth and S3 (or equivalent) charges
Diarization post-processing — "Speaker 0" and "Speaker 1" aren't useful for an LLM. You need a second pass to identify who's actually speaking
Latency — minutes per episode versus seconds with a retrieval API

For a 1-hour podcast, Deepgram's $0.46/hr becomes closer to $1.00+ once you add the surrounding work. Spoken delivers the same output as $0.08–$0.15 for the full episode.

Where Deepgram wins

Pick Deepgram for

Your own audio files: meetings, sales calls, internal recordings
Real-time / streaming transcription
Custom vocabulary and model fine-tuning
Voice agents and live captioning
45+ language coverage with one provider

Pick Spoken for

Any published podcast episode as Markdown
Podcast summarizers, RAG over podcasts, research agents
You want real speaker names, not diarization labels
You don't want to host audio or run a pipeline
Flat per-episode pricing, no per-minute billing

What this looks like in practice

With Deepgram (the pipeline you have to build)

# 1. Find and download the audio file
# 2. Upload to your storage
# 3. Submit to Deepgram with diarization
curl -X POST "https://api.deepgram.com/v1/listen?diarize=true&model=nova-2" \
  -H "Authorization: Token YOUR_KEY" \
  --data-binary @episode.mp3
# 4. Parse JSON, group by speaker, map "Speaker 0/1" to real names with an LLM
# 5. Format as Markdown for your agent

With Spoken

curl -H "x-api-key: pt_demo" https://spoken.md/transcripts/1000651996090
# Done. Markdown response with real names and timestamps.

FAQ

Is Spoken a Deepgram competitor?

Only for the specific use case of getting transcripts of published podcasts. Deepgram is a general speech-to-text platform; Spoken is purpose-built for podcast transcript retrieval. They're complementary, not equivalent.

Is Spoken cheaper than Deepgram?

For published podcasts, yes — typically 5–10x cheaper once you account for audio handling and the LLM pass needed to convert Deepgram's diarization labels into real speaker names. For your own audio, Deepgram is the right tool.

Does Deepgram give real speaker names?

No. Deepgram's diarization outputs "Speaker 0", "Speaker 1", etc. You need to map those to real people using context — typically with another model call. Spoken does this mapping for podcasts as part of the response.

Can I switch between Deepgram and Spoken in the same project?

Yes. Many teams use Spoken for podcasts and Deepgram for user-uploaded audio in the same product. The interfaces are different but neither locks you in.

What does Spoken cost per podcast?

$0.15 each in 100-packs, $0.10 each in 500-packs, $0.08 each in 2,000-packs. Errors are never charged. Credits never expire. Repeat fetches of the same episode are free.

TL;DR: Deepgram is the right choice for your own audio. For published podcasts, Spoken is purpose-built — one API call, real speaker names, and 5–10x cheaper per episode.

Try Spoken with no signup — use API key pt_demo on any endpoint.

$0.10 per transcript. Credits never expire.