skills$openclaw/clawvox

7.0k★

clawvox – OpenClaw Skill

Name: clawvox
Author: abhishek-official1

clawvox is an OpenClaw Skills integration for coding workflows. ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

7.0k stars3.4k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	clawvox
description	ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more. OpenClaw Skills integration.
owner	abhishek-official1
repository	abhishek-official1/clawvox
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @abhishek-official1/clawvox
last updated	Feb 7, 2026

Maintainer

abhishek-official1

Maintains clawvox in the OpenClaw Skills directory.

View GitHub profile

File Explorer

15 files

bin

elevenlabs.md

6.4 KB

scripts

clone.sh

3.9 KB

common.sh

5.7 KB

dub.sh

8.4 KB

isolate.sh

2.8 KB

sfx.sh

3.6 KB

speak.sh

5.6 KB

transcribe.sh

4.3 KB

voices.sh

9.3 KB

_meta.json

307 B

README.md

6.8 KB

SKILL.md

9.1 KB

test.sh

8.6 KB

SKILL.md

name: clawvox description: ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more. homepage: https://elevenlabs.io/developers metadata: { "openclaw": { "emoji": "🎙️", "skillKey": "clawvox", "requires": { "bins": ["curl", "jq"], "env": ["ELEVENLABS_API_KEY"] }, "primaryEnv": "ELEVENLABS_API_KEY" } }

ClawVox

Transform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.

Quick Reference

Action	Command	Description
Speak	`{baseDir}/scripts/speak.sh 'text'`	Convert text to speech
Transcribe	`{baseDir}/scripts/transcribe.sh audio.mp3`	Speech to text
Clone	`{baseDir}/scripts/clone.sh --name "Voice" sample.mp3`	Clone a voice
SFX	`{baseDir}/scripts/sfx.sh "thunder storm"`	Generate sound effects
Voices	`{baseDir}/scripts/voices.sh list`	List available voices
Dub	`{baseDir}/scripts/dub.sh --target es audio.mp3`	Translate audio
Isolate	`{baseDir}/scripts/isolate.sh audio.mp3`	Remove background noise

Setup

Get your API key from elevenlabs.io/app/settings/api-keys
Configure in ~/.openclaw/openclaw.json:

{
  skills: {
    entries: {
      "clawvox": {
        apiKey: "YOUR_ELEVENLABS_API_KEY",
        config: {
          defaultVoice: "Rachel",
          defaultModel: "eleven_turbo_v2_5",
          outputDir: "~/.openclaw/audio"
        }
      }
    }
  }
}

Or set the environment variable:

export ELEVENLABS_API_KEY="your_api_key_here"

Voice Generation (TTS)

Basic Text-to-Speech

# Quick speak with default voice (Rachel)
{baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.'

# Specify voice by name
{baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam'

# Save to file
{baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show'

# Use specific model
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour'

# Adjust voice settings
{baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech'

# Adjust speed
{baseDir}/scripts/speak.sh --speed 1.2 'Faster speech'

# Use multilingual model for other languages
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal'
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag'

Voice Models

Model	Latency	Languages	Best For
`eleven_flash_v2_5`	~75ms	32	Real-time, streaming
`eleven_turbo_v2_5`	~250ms	32	Balanced quality/speed
`eleven_multilingual_v2`	~500ms	29	Long-form, highest quality

Available Voices

Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will

Long-Form Content

# Generate audio from text file
{baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3

Speech-to-Text (Transcription)

Basic Transcription

# Transcribe audio file
{baseDir}/scripts/transcribe.sh recording.mp3

# Save to file
{baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3

# Transcribe with language hint
{baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3

# Include timestamps
{baseDir}/scripts/transcribe.sh --timestamps podcast.mp3

Supported Formats

MP3, MP4, MPEG, MPGA, M4A, WAV, WebM
Maximum file size: 100MB

Voice Cloning

Instant Voice Clone

# Clone from single sample (minimum 30 seconds recommended)
{baseDir}/scripts/clone.sh --name MyVoice recording.mp3

# Clone with description
{baseDir}/scripts/clone.sh --name BusinessVoice \
  --description 'Professional male voice' \
  sample.mp3

# Clone with labels
{baseDir}/scripts/clone.sh --name MyVoice \
  --labels '{"gender":"male","age":"adult"}' \
  sample.mp3

# Remove background noise during cloning
{baseDir}/scripts/clone.sh --name CleanVoice \
  --remove-bg-noise \
  sample.mp3

# Test cloned voice
{baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice'

Voice Library Management

# List all available voices
{baseDir}/scripts/voices.sh list

# Get voice details
{baseDir}/scripts/voices.sh info --name Rachel
{baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM

# Search voices (filter output with grep)
{baseDir}/scripts/voices.sh list | grep -i "female"

# Filter by category
{baseDir}/scripts/voices.sh list --category premade
{baseDir}/scripts/voices.sh list --category cloned

# Download voice preview
{baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3

# Delete custom voice
{baseDir}/scripts/voices.sh delete --id "voice_id"

Sound Effects

# Generate sound effect
{baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof'

# With duration
{baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds'

# With prompt influence (higher = more accurate)
{baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing'

# Save to file
{baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder'

Note: Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)

Voice Isolation

# Remove background noise and isolate voice
{baseDir}/scripts/isolate.sh noisy_recording.mp3

# Save to specific file
{baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3

# Don't tag audio events
{baseDir}/scripts/isolate.sh --no-audio-events recording.mp3

Requirements:

Minimum duration: 4.6 seconds
Supported formats: MP3, WAV, M4A, OGG, FLAC

Dubbing (Multi-Language Translation)

# Dub audio to Spanish
{baseDir}/scripts/dub.sh --target es audio.mp3

# Dub with source language specified
{baseDir}/scripts/dub.sh --source en --target ja video.mp4

# Check dubbing status
{baseDir}/scripts/dub.sh --status --id "dubbing_id"

# Download dubbed audio
{baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3

Supported languages: en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th

API Usage Examples

For direct API access, all scripts use curl under the hood:

# Direct TTS API call
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' \
  --output speech.mp3

Error Handling

All scripts provide helpful error messages:

401: Authentication failed - Check your API key
403: Permission denied - Your API key may not have access
429: Rate limit exceeded - Wait before trying again
500/502/503: ElevenLabs API issues - Try again later

Testing

Run the test suite to verify everything works:

{baseDir}/test.sh YOUR_API_KEY

Or with environment variable:

export ELEVENLABS_API_KEY="your_key"
{baseDir}/test.sh

Troubleshooting

Common Issues

"exec host not allowed (requested gateway)"
- The skill needs to run commands in a sandbox environment
- Configure OpenClaw to use sandbox: tools.exec.host: "sandbox"
- Or enable sandboxing in your OpenClaw config
- Alternative: Configure exec approvals for gateway host (see OpenClaw docs)
Parse errors with quotes or exclamation marks
- Use single quotes instead of double quotes: 'Hello world' not "Hello world!"
- Avoid exclamation marks (!) in text when using double quotes
- For complex text, use the --input option with a file
"ELEVENLABS_API_KEY not set"
- Ensure ELEVENLABS_API_KEY is set or configured in openclaw.json
- Check that the API key is at least 20 characters long
"jq is required but not installed"
- Install jq: apt-get install jq (Linux) or brew install jq (macOS)
"Rate limited"
- Check your ElevenLabs plan quota at elevenlabs.io/app/usage
- Free tier: ~10,000 characters/month
"Voice not found"
- Use {baseDir}/scripts/voices.sh list to see available voices
- Check if the voice ID is correct
"Dubbing failed"
- Ensure source audio is clear and audible
- Check supported language codes
"File too large"
- Transcription: 100MB max
- Dubbing: 500MB max
- Voice cloning: 50MB per file

Debug Mode

# Enable verbose output
DEBUG=1 {baseDir}/scripts/speak.sh 'test'

# Show API request details
DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3

Pricing Notes

ElevenLabs API pricing (approximate):

Flash v2.5: ~$0.06/min
Turbo v2.5: ~$0.06/min
Multilingual v2: ~$0.12/min
Voice cloning: Included in plan
Sound effects: ~$0.02/generation
Transcription: ~$0.02/min (Scribe v1)

Free tier: ~10,000 characters/month

Links

README.md

ElevenLabs Voice Studio for OpenClaw

A comprehensive voice production studio skill for OpenClaw, powered by ElevenLabs. Transform your personal AI assistant into a full-featured voice platform with text-to-speech, speech-to-text, voice cloning, sound effects, and multilingual dubbing capabilities.

🎙️ Features

Text-to-Speech (TTS) - Generate lifelike speech with multiple voice models
Speech-to-Text (STT) - Transcribe audio with high accuracy
Voice Cloning - Clone voices from audio samples
Sound Effects - Generate custom audio effects from text descriptions
Voice Isolation - Remove background noise from audio
Multilingual Dubbing - Translate and dub audio/video to 32+ languages
Voice Library Management - Browse and manage available voices

🚀 Quick Start

Installation

Copy this skill to your OpenClaw skills directory:

cp -r elevenlabs-voice-studio ~/.openclaw/skills/

Set your ElevenLabs API key:

export ELEVENLABS_API_KEY="your_api_key_here"

Or configure in ~/.openclaw/openclaw.json:

{
  skills: {
    entries: {
      "elevenlabs-voice-studio": {
        apiKey: "your_api_key_here"
      }
    }
  }
}

Get your API key from elevenlabs.io/app/settings/api-keys

Basic Usage

# Text to Speech
elevenlabs speak "Hello from ElevenLabs Voice Studio!"
elevenlabs speak -v Adam "Hello from Adam's voice"

# Transcribe Audio
elevenlabs transcribe meeting.mp3
elevenlabs transcribe -o transcript.txt recording.mp3

# Clone a Voice
elevenlabs clone -n MyVoice sample.mp3

# List Voices
elevenlabs voices list
elevenlabs voices info --name Rachel

# Generate Sound Effects
elevenlabs sfx "Thunder storm with heavy rain"

# Remove Background Noise
elevenlabs isolate noisy_recording.mp3 -o clean.mp3

# Dub to Another Language
elevenlabs dub -t es audio.mp3  # Spanish
elevenlabs dub -s en -t ja video.mp4  # English to Japanese

📚 Available Commands

speak - Text to Speech

elevenlabs speak [options] <text>

Options:
  -v, --voice <name>        Voice name or ID (default: Rachel)
  -m, --model <model>       TTS model: eleven_flash_v2_5, eleven_turbo_v2_5, eleven_multilingual_v2
  -o, --out <file>          Output file path
  -i, --input <file>        Read text from file
  --stability <0-1>         Voice stability
  --similarity <0-1>        Similarity boost
  --style <0-1>             Style exaggeration
  --speaker-boost           Enable speaker boost

transcribe - Speech to Text

elevenlabs transcribe [options] <audio_file>

Options:
  -o, --out <file>          Output file path
  -l, --language <code>     Language hint (en, es, fr, etc.)
  -t, --timestamps          Include word timestamps

clone - Voice Cloning

elevenlabs clone [options] <sample_files...>

Options:
  -n, --name <name>         Name for the cloned voice (required)
  -d, --description <text>  Voice description
  -l, --labels <json>       Labels as JSON
  --remove-bg-noise         Remove background noise from samples

voices - Voice Library

elevenlabs voices list
elevenlabs voices info --id <voice_id>
elevenlabs voices delete --id <voice_id>

sfx - Sound Effects

elevenlabs sfx [options] <description>

Options:
  -d, --duration <seconds>  Approximate duration
  -o, --out <file>          Output file path
  --influence <0-1>         Prompt influence

isolate - Voice Isolation

elevenlabs isolate [options] <audio_file>

Options:
  -o, --out <file>          Output file path

dub - Audio/Video Dubbing

elevenlabs dub [options] <file>

Options:
  -t, --target <lang>       Target language (required)
  -s, --source <lang>       Source language (auto-detected if not specified)
  -o, --out <file>          Output file path
  --status --id <id>        Check dubbing status
  --download --id <id>      Download dubbed audio

Supported Languages:
  en (English), es (Spanish), fr (French), de (German), it (Italian),
  pt (Portuguese), pl (Polish), hi (Hindi), ar (Arabic), zh (Chinese),
  ja (Japanese), ko (Korean), nl (Dutch), ru (Russian), tr (Turkish),
  vi (Vietnamese), sv (Swedish), da (Danish), fi (Finnish), and more...

🔧 Environment Variables

Variable	Description	Default
`ELEVENLABS_API_KEY`	Your ElevenLabs API key (required)	-
`ELEVENLABS_DEFAULT_VOICE`	Default voice name	Rachel
`ELEVENLABS_DEFAULT_MODEL`	Default TTS model	eleven_turbo_v2_5
`ELEVENLABS_OUTPUT_DIR`	Default output directory	~/.openclaw/audio

📖 Voice Models

Model	Latency	Languages	Best For
`eleven_flash_v2_5`	~75ms	32	Real-time, streaming
`eleven_turbo_v2_5`	~250ms	32	Balanced quality/speed
`eleven_multilingual_v2`	~500ms	29	Long-form, highest quality

🎭 Built-in Voices

Rachel - Calm and professional female voice
Adam - Confident male voice
Antoni - Energetic male voice
Bella - Soft female voice
Domi - Strong female voice
Elli - Warm female voice
Josh - Deep male voice
Sam - Young male voice

💰 Pricing

ElevenLabs API pricing (approximate):

Flash v2.5: ~$0.06/min
Turbo v2.5: ~$0.06/min
Multilingual v2: ~$0.12/min
Voice cloning: Included in plan
Sound effects: ~$0.02/generation

Free tier: ~10,000 characters/month

🧪 Testing

Run the test suite:

./test.sh <your_api_key>

🔗 Links

📝 License

MIT License - See OpenClaw project license

🏆 Hackathon Submission

This skill was created for the Clawdbot x ElevenLabs Developer Challenge.

Features:

✅ Full ElevenLabs API coverage (TTS, STT, Clone, SFX, Dub, Isolate)
✅ OpenClaw-native implementation with SKILL.md
✅ Comprehensive CLI with help and error handling
✅ Multi-language support (32+ languages)
✅ Voice library management
✅ Test suite included

Technically Deep:

Implements all major ElevenLabs APIs
Proper error handling and rate limit awareness
JSON processing with jq
File handling and directory management
Form data and multipart uploads

Practically Useful:

Content creators: audiobooks, podcasts, video narration
Multi-channel users: voice messages across all OpenClaw channels
Business users: professional voice content
Accessibility: voice interaction for text-based tasks

Thoughtfully Implemented:

Follows OpenClaw skill conventions
Secure API key handling
Graceful error messages
Comprehensive documentation
Easy installation and setup

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

OpenClaw CLI installed and configured.
Language: Markdown
License: MIT
Topics:

FAQ

How do I install clawvox?

Run openclaw add @abhishek-official1/clawvox in your terminal. This installs clawvox into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/abhishek-official1/clawvox. Review commits and README documentation before installing.