skills$openclaw/clonev

5.0k★

clonev – OpenClaw Skill

Name: clonev
Author: instant-picture

clonev is an OpenClaw Skills integration for coding workflows. Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

5.0k stars3.3k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	clonev
description	Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice). OpenClaw Skills integration.
owner	instant-picture
repository	instant-picture/clonev
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @instant-picture/clonev
last updated	Feb 7, 2026

Maintainer

instant-picture

Maintains clonev in the OpenClaw Skills directory.

View GitHub profile

File Explorer

6 files

references

complete-guide.md

6.2 KB

scripts

clonev.sh

1.6 KB

_meta.json

273 B

SKILL.md

5.9 KB

SKILL.md

name: clonev description: Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

CloneV Skill - Voice Cloning Made Simple

⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS

DO NOT try to use Docker containers directly. DO NOT try to interact with coqui-xtts container - it is broken and restarting. DO NOT try to use APIs or servers.

ONLY USE THE SCRIPT: scripts/clonev.sh

The script handles everything automatically. Just call it with text, voice sample, and language.

What This Skill Does

Clones any voice from a short audio sample and generates new speech in that voice.

Input:

Text to speak
Voice sample (WAV file, 6-30 seconds)
Language code

Output: OGG voice file (cloned voice speaking the text)

Works with: Any voice! Yours, a celebrity, a character, etc.

The ONE Command You Need

$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)

That's it! Nothing else needed.

Step-by-Step Usage (FOR AI MODELS)

Step 1: Get the required inputs

Text to speak (from user)
Path to voice sample WAV file (from user)
Language code (from user or default to en)

Step 2: Run the script

VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)

Step 3: Use the output

The variable $VOICE_FILE now contains the path to the generated OGG file.

Complete Working Examples

Example 1: Clone voice and send to Telegram

# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)

# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 2: Clone voice in Czech

# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)

# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 3: Full workflow with check

#!/bin/bash

# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)

# Verify file was created
if [ -f "$VOICE" ]; then
    echo "Success! Voice file: $VOICE"
    ls -lh "$VOICE"
else
    echo "Error: Voice file not created"
fi

Common Language Codes

Code	Language	Example Usage
`en`	English	`scripts/clonev.sh "Hello" sample.wav en`
`cs`	Czech	`scripts/clonev.sh "Ahoj" sample.wav cs`
`de`	German	`scripts/clonev.sh "Hallo" sample.wav de`
`fr`	French	`scripts/clonev.sh "Bonjour" sample.wav fr`
`es`	Spanish	`scripts/clonev.sh "Hola" sample.wav es`

Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko

Voice Sample Requirements

Format: WAV file
Length: 6-30 seconds (optimal: 10-15 seconds)
Quality: Clear audio, no background noise
Content: Any speech (the actual words don't matter)

Good samples:

✅ Recording of someone speaking clearly
✅ No music or noise in background
✅ Consistent volume

Bad samples:

❌ Music or songs
❌ Heavy background noise
❌ Very short (< 6 seconds)
❌ Very long (> 30 seconds)

⚠️ Important Notes

Model Download

First use downloads ~1.87GB model (one-time)
Model is stored at: /mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/
Status: ✅ Already downloaded

Processing Time

Takes 20-40 seconds depending on text length
This is normal - voice cloning is computationally intensive

Troubleshooting

"Command not found"

Make sure you're in the skill directory or use full path:

/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en

"Voice sample not found"

Check the path to the WAV file
Use absolute paths (starting with /)
Ensure file exists: ls -la /path/to/sample.wav

"Model not found"

The model should auto-download. If not:

cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
  -v $(pwd)/models-xtts:/root/.local/share/tts \
  ghcr.io/coqui-ai/tts:latest \
  python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"

Poor voice quality

Use clearer voice sample
Ensure no background noise
Try different sample (some voices clone better)

Quick Reference Card (FOR AI MODELS)

USER: "Clone my voice and say 'hello'"
→ Get: sample path, text="hello", language="en"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
→ Result: $VOICE contains path to OGG file
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

USER: "Make me speak Czech"
→ Get: sample path, text="Ahoj", language="cs"  
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

Output Location

Generated files are saved to:

/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg

The script returns this path, so you can use it directly.

Summary

ONLY use the script: scripts/clonev.sh
NEVER try to use Docker containers directly
NEVER try to interact with the coqui-xtts container
Script handles everything automatically
Returns path to OGG file ready to send

Simple. Just use the script.

Clone any voice. Speak any language. Just use the script.

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

- **Format**: WAV file - **Length**: 6-30 seconds (optimal: 10-15 seconds) - **Quality**: Clear audio, no background noise - **Content**: Any speech (the actual words don't matter) **Good samples**: - ✅ Recording of someone speaking clearly - ✅ No music or noise in background - ✅ Consistent volume **Bad samples**: - ❌ Music or songs - ❌ Heavy background noise - ❌ Very short (< 6 seconds) - ❌ Very long (> 30 seconds) ---

FAQ

How do I install clonev?

Run openclaw add @instant-picture/clonev in your terminal. This installs clonev into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/instant-picture/clonev. Review commits and README documentation before installing.