Deepgram is a speech-to-text API. Spoken is a podcast transcript retrieval API. For published podcasts, retrieval is cheaper, faster, and ships with real speaker names.
Use Deepgram when you have your own audio. Use Spoken when you want a published podcast as Markdown. Deepgram's Nova-2 transcribes audio at roughly $0.46 per hour with diarization labels (Speaker 1, Speaker 2). Spoken returns the existing transcript for any published podcast episode for $0.08–$0.15 flat, with real speaker names already attached, no audio file required.
| Deepgram (Nova-2 + diarization) | Spoken | |
|---|---|---|
| Category | Speech-to-text API | Transcript retrieval API |
| Input | Audio file URL or upload | Search query or episode ID |
| Speaker labels | "Speaker 0", "Speaker 1" (diarization) | Real names: "Lex Fridman", "Sam Altman" |
| Cost per podcast hour | ~$0.46/hr + audio handling + LLM pass for naming | $0.08–$0.15 per episode, names included |
| Output format | JSON transcript with word timings | Markdown with speaker bold + timestamps |
| Time to first transcript | Download audio + processing time | Under 30 seconds, single fetch |
| Streaming/real-time | Yes | No (published podcasts only) |
| Works on your own audio | Yes (any file) | No |
| Works on published podcasts | Yes, after you fetch and host the audio | Yes, in one API call |
Deepgram's listed price is per minute of audio. For published podcasts, the real cost includes everything around that:
For a 1-hour podcast, Deepgram's $0.46/hr becomes closer to $1.00+ once you add the surrounding work. Spoken delivers the same output as $0.08–$0.15 for the full episode.
# 1. Find and download the audio file
# 2. Upload to your storage
# 3. Submit to Deepgram with diarization
curl -X POST "https://api.deepgram.com/v1/listen?diarize=true&model=nova-2" \
-H "Authorization: Token YOUR_KEY" \
--data-binary @episode.mp3
# 4. Parse JSON, group by speaker, map "Speaker 0/1" to real names with an LLM
# 5. Format as Markdown for your agent
curl -H "x-api-key: pt_demo" https://spoken.md/transcripts/1000651996090
# Done. Markdown response with real names and timestamps.
Only for the specific use case of getting transcripts of published podcasts. Deepgram is a general speech-to-text platform; Spoken is purpose-built for podcast transcript retrieval. They're complementary, not equivalent.
For published podcasts, yes — typically 5–10x cheaper once you account for audio handling and the LLM pass needed to convert Deepgram's diarization labels into real speaker names. For your own audio, Deepgram is the right tool.
No. Deepgram's diarization outputs "Speaker 0", "Speaker 1", etc. You need to map those to real people using context — typically with another model call. Spoken does this mapping for podcasts as part of the response.
Yes. Many teams use Spoken for podcasts and Deepgram for user-uploaded audio in the same product. The interfaces are different but neither locks you in.
$0.15 each in 100-packs, $0.10 each in 500-packs, $0.08 each in 2,000-packs. Errors are never charged. Credits never expire. Repeat fetches of the same episode are free.
TL;DR: Deepgram is the right choice for your own audio. For published podcasts, Spoken is purpose-built — one API call, real speaker names, and 5–10x cheaper per episode.
Try Spoken with no signup — use API key pt_demo on any endpoint.
$0.10 per transcript. Credits never expire.