- Dialogue — speaker-attributed lines with sentence-level timestamps. The default. Best for chat-style UIs, LLM context windows, and most reading.
- Words — every word individually timestamped. Best for karaoke-style highlighting, precision audio editing, and word-aligned search.
- Mentions — lines surrounding mentions of a specific entity, with
is_mentionflags. Best for “what did they say about X?” workflows.
Available to MCP agents via
particle_podcast_get_episode with include: ["transcript"] (and optional transcript_format, transcript_speaker, transcript_start, transcript_end). For mention-style windows, use particle_podcast_find_mentions.Dialogue transcript
Response (truncated)
/segments to see which time ranges are tagged AD if you want to skip them. See segments.
Output formats
Use theformat query parameter:
- Dialogue (default)
- Plain text
- SRT subtitles
Structured JSON with speaker attribution, roles, and timestamps per line.Best for: building conversation UIs, speaker analysis, programmatic processing.
Filter by speaker or time range
Extract everything one person said:Word-level transcript
For per-word timing — karaoke-style highlighting, precision editing, word-aligned search:Response (truncated)
Query parameters
| Param | Default | Description |
|---|---|---|
start | 0 | Start time in seconds for time-range clipping. |
end | end of episode | End time in seconds for time-range clipping. |
limit | unset (return all) | Max words per page (1–5000). Omit to return every matching word. |
cursor | — | Opaque cursor from a previous cursor response field. |
exclude_spacing | false | Drop type:"spacing" tokens. See below. |
Spacing tokens
Roughly half of the entries in a typical word transcript havetype:"spacing" and text:" " — these represent the silence between spoken words, with their own start_seconds/end_seconds. They’re useful when timing matters: caption/subtitle UIs that need precise pause durations, karaoke-style word highlighting against playback, or reconstructing inter-word silences for audio alignment. NLP, search, and LLM consumers should pass exclude_spacing=true to skip them — it roughly halves the payload.
Pagination
When you passlimit, the response includes a cursor and has_more: true until the last page:
speaker field is the identified speaker name (e.g. Scott Galloway). When a speaker hasn’t been resolved to a name, the raw STT label (speaker_0, speaker_1, …) is returned instead.
Transcript mentions
The most useful transcript view when you care about a specific person, company, or topic. Returns the dialogue lines around every mention of an entity in one episode, withis_mention: true on the lines that actually contain the mention and surrounding lines for context.
Response (truncated)
start_seconds / end_seconds to deep-link into the audio, or feed the line text into an LLM with full speaker context.
Chunking semantics
The endpoint groups transcript lines into context windows:context_lines(default2, max20) is the radius of each window. A mention on lineiproduces the closed range[i - context_lines, i + context_lines], clamped at the transcript boundaries — so a single mention yields up to2 * context_lines + 1lines.- Adjacent or overlapping windows merge into a single window. Two mentions within
context_linesof each other produce one entry with multipleis_mention: truelines, not two entries. total_mention_countis the unfiltered count of dialogue lines containing at least one mention of the entity in the episode. It is independent of pagination, and equals the sum ofis_mention=trueflags across the un-paginatedmentions[].- Mention matching is a case-sensitive substring match against
mention_variants(the entity’s canonical name plus any annotated aliases).
Pagination
| Caller shape | Pagination axis | Cursor refers to |
|---|---|---|
entity_id provided | mentions[] (windows) | offset into windows for that entity |
entity_id omitted | entities[] | offset into entity list |
entity_id is set, exactly one entity is returned and its mentions[] is paged. When entity_id is omitted, each entity in the page returns all of its mentions; deep-paging through a single entity’s mentions requires re-issuing with that entity_id.
limit defaults to 25 and is capped at 100. Pass the cursor from a previous response to fetch the next page; has_more indicates whether more results exist.
Segment and clip transcripts
Segments and clips have their own transcript endpoints scoped to that time range:format query parameter (dialogue, text, or srt).
Related
- Episodes — discovery and sub-resources
- Segments & clips — get a clip ID, then its transcript
- Knowledge graph → entities — pick the entity to mention-search