2.0kā
by matusvojtek
tubescribe ā OpenClaw Skill
tubescribe is an OpenClaw Skills integration for writing workflows. YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
Skill Snapshot
| name | tubescribe |
| description | YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video. OpenClaw Skills integration. |
| owner | matusvojtek |
| repository | matusvojtek/tubescribe |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @matusvojtek/tubescribe |
| last updated | Feb 7, 2026 |
Maintainer

name: tubescribe description: "YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video."
TubeScribe š¬
Turn any YouTube video into a polished document + audio summary in seconds.
Drop a YouTube link ā get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
šø 100% Free & Local
- No subscription ā runs entirely on your machine
- No API keys required ā works out of the box
- No data leaves your computer ā your content stays private
- No usage limits ā summarize as many videos as you want
⨠Features
- šÆ Smart Speaker Detection ā Automatically identifies who's talking in interviews, podcasts, and conversations
- š Clickable Timestamps ā Every quote links directly to that moment in the video
- š Clean Documents ā Export as HTML, DOCX, or Markdown
- š Audio Summaries ā Listen to the key points (MP3/WAV)
- š Zero Config ā Works out of the box, upgrades available for power users
š¬ Works With Any Video
- Interviews & podcasts (multi-speaker detection)
- Lectures & tutorials (single speaker)
- Music videos (lyrics extraction)
- News & documentaries
- Any YouTube content with captions
Quick Start
When user sends a YouTube URL, run the full pipeline automatically:
# 1. Extract transcript
python skills/tubescribe/scripts/tubescribe.py "YOUTUBE_URL"
This creates:
/tmp/tubescribe_{video_id}_source.jsonā metadata + transcript/tmp/tubescribe_{video_id}_output.mdā path for output
Then process with sub-agent (see workflow below).
First-Time Setup
Run setup to check dependencies and configure defaults:
python skills/tubescribe/scripts/setup.py
This checks: summarize CLI, pandoc/python-docx, ffmpeg, Kokoro TTS
Full Workflow
Step 1: Extract Transcript
python skills/tubescribe/scripts/tubescribe.py "https://youtube.com/watch?v=VIDEO_ID"
Step 2: Process with Sub-Agent
Spawn a sub-agent to analyze and format:
sessions_spawn(
task="""Read /tmp/tubescribe_{video_id}_source.json and create formatted output.
**Output to:** /tmp/tubescribe_{video_id}_output.md
**Format:**
1. # Title (from metadata)
2. ## Participants ā identify speakers from context
3. ## Summary ā 3-5 paragraphs covering main topics
4. ## Key Quotes ā 5 best quotes with timestamps [[MM:SS]](https://youtu.be/{video_id}?t=SECONDS)
5. ## Full Transcript ā ALL segments with:
- Speaker labels (**Name:** ) when identifiable
- Clickable timestamps: [[0:42]](https://youtu.be/{video_id}?t=42)
- Convert MM:SS to seconds for links
**Speaker Detection:**
- Use context clues (questions vs answers, explicit names, speaking patterns)
- For single-speaker videos, use narrator label or skip speaker labels
- For interviews: host asks questions, guest gives longer answers
""",
label="tubescribe",
runTimeoutSeconds=600,
cleanup="delete"
)
Step 3: Create Document
Convert markdown to final format:
# HTML (no dependencies beyond Python)
python skills/tubescribe/scripts/html_writer.py /tmp/tubescribe_{video_id}_output.md output.html
# DOCX with pandoc (best formatting)
pandoc /tmp/tubescribe_{video_id}_output.md -o output.docx
# Markdown (just copy the file)
cp /tmp/tubescribe_{video_id}_output.md output.md
Step 4: Generate Audio Summary (Optional)
Extract summary section and generate TTS:
# Read summary from output markdown
# Generate audio using Kokoro (preferred) or built-in TTS
# Save to {output_dir}/{title}_summary.wav or .mp3
Step 5: Open Results
open output.html # or .docx or .md
open -a "QuickTime Player" output_summary.wav
Configuration
Config file: ~/.tubescribe/config.json
{
"output": {
"folder": "~/Documents/TubeScribe",
"open_folder_after": true
},
"document": {
"format": "docx"
},
"audio": {
"enabled": true,
"format": "mp3",
"tts_engine": "kokoro"
}
}
Options:
output.folder: Where to save files (default:~/Documents/TubeScribe)document.format:html(default, no deps),docx(with pandoc/python-docx),md(raw markdown)audio.format:mp3(with ffmpeg),wav(default without ffmpeg)audio.tts_engine:builtin(macOS say),kokoro(high quality)
Output Structure
~/Documents/TubeScribe/
āāā {Video Title}.html # Formatted document (or .docx / .md)
āāā {Video Title}_summary.mp3 # Audio summary (or .wav)
After generation, opens the folder (not individual files) so you can access everything.
Dependencies
Required:
summarizeCLI ābrew install steipete/tap/summarize- Python 3.8+
Optional (better quality):
pandocā DOCX output:brew install pandocffmpegā MP3 audio:brew install ffmpeg- Kokoro TTS ā High-quality voices: see https://github.com/hexgrad/kokoro
Tips
- For long videos (>30 min), increase sub-agent timeout to 900s
- Speaker detection works best with clear interview/podcast formats
- Single-speaker videos (tutorials, lectures) skip speaker labels automatically
- Timestamps link directly to YouTube at that moment
No README available.
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
**Required:** - `summarize` CLI ā `brew install steipete/tap/summarize` - Python 3.8+ **Optional (better quality):** - `pandoc` ā DOCX output: `brew install pandoc` - `ffmpeg` ā MP3 audio: `brew install ffmpeg` - Kokoro TTS ā High-quality voices: see https://github.com/hexgrad/kokoro
Configuration
Config file: `~/.tubescribe/config.json` ```json { "output": { "folder": "~/Documents/TubeScribe", "open_folder_after": true }, "document": { "format": "docx" }, "audio": { "enabled": true, "format": "mp3", "tts_engine": "kokoro" } } ``` Options: - `output.folder`: Where to save files (default: `~/Documents/TubeScribe`) - `document.format`: `html` (default, no deps), `docx` (with pandoc/python-docx), `md` (raw markdown) - `audio.format`: `mp3` (with ffmpeg), `wav` (default without ffmpeg) - `audio.tts_engine`: `builtin` (macOS say), `kokoro` (high quality)
FAQ
How do I install tubescribe?
Run openclaw add @matusvojtek/tubescribe in your terminal. This installs tubescribe into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/matusvojtek/tubescribe. Review commits and README documentation before installing.
