Fetch any episode as Markdown, drop it into an LLM, get a summary back. The whole project is two API calls and a prompt.
The hard part of building a podcast summarizer used to be getting a clean transcript with speaker names. With Spoken, that step is one API call. You search by topic or paste a URL, fetch the transcript as Markdown, and pass it to your LLM of choice with a summary prompt. A typical one-hour episode fits in one context window — no chunking, no diarization, no audio handling.
import requests
from anthropic import Anthropic
API_KEY = "YOUR_SPOKEN_KEY"
client = Anthropic()
def summarize_podcast(query):
# 1. Find the episode
r = requests.get(f"https://spoken.md/search?q={query}",
headers={"x-api-key": API_KEY})
results = r.json()["results"]
if not results:
return "No matching episodes found."
episode = results[0]
# 2. Fetch the transcript
transcript = requests.get(
f"https://spoken.md/transcripts/{episode['id']}",
headers={"x-api-key": API_KEY},
).text
# 3. Summarize
msg = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Summarize this podcast episode in 5 bullet points, "
f"attributing key points to specific speakers:\n\n{transcript}"
}],
)
return f"# {episode['title']}\n{episode['podcast']}\n\n{msg.content[0].text}"
print(summarize_podcast("huberman sleep"))
That's the whole thing. Real speaker names land in the prompt as **Andrew Huberman** (0:00), so the LLM can attribute claims correctly without prompt engineering tricks.
The same summarizer without Spoken needs:
For a side project, that's a weekend of plumbing. For a production product, it's ongoing infrastructure to maintain.
Prompt: "Summarize this podcast in 5 bullets, attributing claims to speakers."
Output:
• **Andrew Huberman** opens by framing the episode around the
neuroscience of sleep architecture and its impact on next-day cognition.
• **Matt Walker** explains that REM and deep sleep play distinct roles —
deep sleep consolidates declarative memory; REM consolidates procedural.
• They discuss the 90-minute ultradian cycle and why waking inside a
cycle (vs at the end of one) produces worse subjective grogginess.
• Walker recommends consistency of sleep timing over total hours as
the single most impactful intervention for most adults.
• Closing segment covers caffeine's 5–6 hour half-life and Walker's
suggestion to cut off intake by 2 PM.
A one-hour podcast is roughly 8,000–15,000 tokens. With Spoken at the 500-pack rate and Claude Haiku as the summarizer:
At the 2,000-pack rate the transcript fetch drops to $0.08, so the all-in cost is around $0.09–$0.10 per episode summarized.
Yes for almost every episode. A one-hour podcast is 8,000–15,000 tokens. Models like Claude Sonnet/Opus (200K context), GPT-4o (128K context), and Gemini (1M context) all fit a full episode comfortably with room for prompt and output.
For straight summarization, Claude Haiku and GPT-4o-mini both produce strong results at low cost. For nuanced extraction or multi-step reasoning, step up to Claude Sonnet or GPT-4o.
Yes. Use /search with the podcast name to enumerate episodes, then loop through and fetch each. Repeat fetches are free, so re-running with a different prompt later doesn't cost more.
Per speaker turn, in (H:MM) or (H:MM:SS) form. The LLM can quote them directly when citing specific moments.
Yes. Use the demo key pt_demo with any endpoint — no signup needed. The demo key returns a full transcript so you can run the summarizer end-to-end before purchasing credits.
TL;DR: A working podcast summarizer is now a two-API-call project. Spoken returns clean Markdown with real speaker names; pass it to any LLM with a summary prompt. ~$0.10 per episode end-to-end.
Try the summarizer with no signup — use API key pt_demo on any endpoint.
$0.10 per transcript. Credits never expire.