skills$openclaw/story-video

9.7k★

story-video – OpenClaw Skill

Name: story-video
Author: snail3d

story-video is an OpenClaw Skills integration for coding workflows. Convert narrated stories (audio + text) into YouTube Shorts videos (9:16 portrait) with synced subtitles, dynamic background images matched to story content, and professional subtitle effects.

9.7k stars1.4k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	story-video
description	Convert narrated stories (audio + text) into YouTube Shorts videos (9:16 portrait) with synced subtitles, dynamic background images matched to story content, and professional subtitle effects. OpenClaw Skills integration.
owner	snail3d
repository	snail3d/clawforgodpath: story-video-skill
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @snail3d/clawforgod:story-video-skill
last updated	Feb 7, 2026

Maintainer

snail3d

Maintains story-video in the OpenClaw Skills directory.

View GitHub profile

File Explorer

10 files

story-video-skill

scripts

generate_animated_video.sh

4.4 KB

generate_branded_video.sh

4.5 KB

generate_final_video.sh

3.0 KB

generate_video_standalone.sh

1.8 KB

generate_video.py

10.2 KB

search_images.py

4.7 KB

subtitle_renderer.py

8.5 KB

transcribe_audio.py

2.0 KB

SKILL.md

8.5 KB

SKILL.md

name: story-video description: Convert narrated stories (audio + text) into YouTube Shorts videos (9:16 portrait) with synced subtitles, dynamic background images matched to story content, and professional subtitle effects.

Story-to-Video Skill

Convert bedtime stories, narrations, or any spoken content into engaging YouTube Shorts videos with:

Synced subtitles - Words highlighted in real-time as spoken
Dynamic backgrounds - Images searched and selected based on story content/section
YouTube Shorts format - 9:16 portrait video optimized for mobile
Professional styling - Centered, animated subtitle effects

Quick Start

Input Requirements

Audio file - MP3/WAV with narration (e.g., from ElevenLabs TTS)
Full text transcript - Complete story/narration text
Story sections (optional) - If available, define sections for targeted background images

Basic Workflow

# 1. Transcribe audio to get word timing (automatic)
story-video transcribe --audio story.mp3 --output story.json

# 2. Generate video with auto-searched backgrounds
story-video generate \
  --audio story.mp3 \
  --text "Once upon a time..." \
  --title "Bedtime Story" \
  --output story.mp4

Output: story.mp4 (9:16 portrait, YouTube Shorts ready)

Advanced: Custom Sections & Backgrounds

# Create a config with sections and suggested image searches
story-video generate \
  --audio story.mp3 \
  --text full_text.txt \
  --config story-config.json \
  --output story.mp4

story-config.json:

{
  "title": "The Snail Designer",
  "sections": [
    {
      "start_time": 0,
      "end_time": 15,
      "text": "Once upon a time, in the beautiful city of El Paso...",
      "search_query": "El Paso desert sunset"
    },
    {
      "start_time": 15,
      "end_time": 35,
      "text": "...a gentle snail named Snail was a designer.",
      "search_query": "3D design workshop creative tools"
    }
  ]
}

How It Works

1. Audio Transcription + Timing

Uses Groq Whisper (or local speech-to-text) to get word-level timing
Outputs JSON with {word, start_ms, end_ms} for each word
Enables precise subtitle sync

2. Section Detection

Divides audio into chunks (10-30s sections)
Generates targeted image search queries from text content
Searches Unsplash/Pexels for relevant high-quality images

3. Video Composition

Creates 9:16 canvas (1080x1920 pixels)
Layers background image (center-cropped, subtle zoom)
Renders subtitles centered, synchronized to audio
Applies subtitle effects:
- Fade in/out as words appear/disappear
- Color highlight - Current word in bright color, context in white
- Scale animation - Current word slightly larger
- Drop shadow - Professional readability on any background

4. Video Export

Combines audio + video layers
H.264 codec, optimized bitrate for YouTube
Metadata tags for YouTube Shorts (aspect ratio, duration)

Configuration Options

Subtitle Styling

{
  "subtitles": {
    "font": "Inter",
    "size": 48,
    "color_current": "#FFD700",
    "color_context": "#FFFFFF",
    "shadow": true,
    "shadow_blur": 8,
    "shadow_color": "#000000",
    "shadow_offset_y": 3,
    "animation_type": "fade_scale",
    "animation_duration_ms": 200
  }
}

Background Options

{
  "background": {
    "source": "unsplash",
    "fallback_color": "#1a1a1a",
    "zoom_effect": "subtle",
    "zoom_speed": 0.3,
    "fade_between_sections": true,
    "fade_duration_ms": 500
  }
}

Commands

`story-video transcribe`

Generate word-level timing from audio.

story-video transcribe --audio input.mp3 --output timing.json

Options:

--audio (required) - Audio file path
--output (required) - JSON output with timings
--engine (optional) - groq, google, or local (default: groq)

Output format:

{
  "duration_ms": 45000,
  "words": [
    {"word": "Once", "start_ms": 0, "end_ms": 250},
    {"word": "upon", "start_ms": 250, "end_ms": 450},
    ...
  ]
}

`story-video generate`

Create video from audio + text.

story-video generate \
  --audio input.mp3 \
  --text "Story text..." \
  --output output.mp4

Options:

--audio (required) - MP3/WAV file
--text (required) - Full transcript text
--output (required) - MP4 output path
--config (optional) - JSON config file (sections, styling, etc.)
--title (optional) - Video title (for metadata)
--subtitle-style (optional) - Preset: minimal, bold, elegant (default: bold)
--background-source (optional) - unsplash, pexels, local_dir (default: unsplash)

`story-video style-preset`

List available subtitle style presets.

story-video style-preset list
story-video style-preset preview bold

Presets:

minimal - Small, centered, subtle animation
bold - Large, bright highlight, dynamic animation
elegant - Serif font, refined colors, smooth fade
neon - Bright colors, glow effect, fast animation

Image Search Strategy

The skill auto-generates search queries based on story content:

Noun extraction - Identifies key nouns (snail, designer, El Paso, daughters)
Context keywords - Adds context (sunset, desert, workshop, family)
Search execution - Finds relevant images from Unsplash
Quality filter - Prefers high-res, professional photos
Caching - Saves images locally to avoid repeated searches

Example:

Text: "snail named Snail who was a three-dimensional designer"
→ Search: "3D design workshop creative snail"
→ Results: [image1, image2, image3]
→ Select: Best match for this section

Requirements

System Dependencies

ffmpeg - Video composition (brew install ffmpeg)
python3 - Image processing (PIL/Pillow)

API Keys

Groq API - Audio transcription (set GROQ_API_KEY)
Unsplash API (optional) - Image search (set UNSPLASH_API_KEY for more requests)
ElevenLabs API (optional) - If generating TTS from text first

Python Libraries

ffmpeg-python
pydub
pillow
requests

Workflow Examples

Example 1: Bedtime Story from TTS

# 1. Generate audio (your voice) via ElevenLabs
tts "Once upon a time..." --voice hjX6Urz6dBwVkFdr87DB --output story.mp3

# 2. Convert to video
story-video generate \
  --audio story.mp3 \
  --text "Once upon a time..." \
  --subtitle-style bold \
  --background-source unsplash \
  --output story-video.mp4

# 3. Upload to YouTube Shorts
# (9:16 format is ready!)

Example 2: Existing Audio with Custom Sections

# Create config with specific sections and background queries
cat > config.json << EOF
{
  "title": "The Snail",
  "sections": [
    {
      "start_time": 0,
      "end_time": 20,
      "search_query": "El Paso desert landscape"
    },
    {
      "start_time": 20,
      "end_time": 45,
      "search_query": "3D design studio workspace"
    }
  ]
}
EOF

# Generate video with custom sections
story-video generate \
  --audio narration.mp3 \
  --text transcript.txt \
  --config config.json \
  --output output.mp4

Example 3: Multiple Stories as Shorts Series

# Generate videos for each story
for story in stories/*.txt; do
  audio="${story%.txt}.mp3"
  output="videos/$(basename $story .txt).mp4"
  
  story-video generate \
    --audio "$audio" \
    --text "$story" \
    --subtitle-style elegant \
    --output "$output"
done

# All ready for YouTube Shorts series
ls -lh videos/*.mp4

Troubleshooting

Video is too fast/slow

Adjust audio speed before generating (use ffmpeg -filter:a "atempo=0.9" to slow down).

Background images not matching content

Customize search queries in config.json sections[].search_query field.

Subtitle readability on bright backgrounds

Switch to --subtitle-style elegant (adds stronger shadow) or use the shadow config option.

ffmpeg not found

Install: brew install ffmpeg

API rate limits

Groq: Free tier has rate limits; use local Whisper if needed
Unsplash: Free tier is 50 requests/hour; cache images locally

Bundled Resources

scripts/generate_video.py - Main video composition logic
scripts/transcribe_audio.py - Word-level timing extraction
scripts/search_images.py - Unsplash/Pexels image search
scripts/subtitle_renderer.py - Animated subtitle rendering
references/ffmpeg_settings.md - FFmpeg optimization for YouTube Shorts
references/subtitle_effects.md - Available animation effects and customization
assets/fonts/ - Default fonts (Inter, Serif fallback)

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

1. **Audio file** - MP3/WAV with narration (e.g., from ElevenLabs TTS) 2. **Full text transcript** - Complete story/narration text 3. **Story sections (optional)** - If available, define sections for targeted background images

Configuration

### Subtitle Styling ```json { "subtitles": { "font": "Inter", "size": 48, "color_current": "#FFD700", "color_context": "#FFFFFF", "shadow": true, "shadow_blur": 8, "shadow_color": "#000000", "shadow_offset_y": 3, "animation_type": "fade_scale", "animation_duration_ms": 200 } } ``` ### Background Options ```json { "background": { "source": "unsplash", "fallback_color": "#1a1a1a", "zoom_effect": "subtle", "zoom_speed": 0.3, "fade_between_sections": true, "fade_duration_ms": 500 } } ```

FAQ

How do I install story-video?

Run openclaw add @snail3d/clawforgod:story-video-skill in your terminal. This installs story-video into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/snail3d/clawforgod. Review commits and README documentation before installing.