7.6k★by paki81
qwen-tts – OpenClaw Skill
qwen-tts is an OpenClaw Skills integration for coding workflows. Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
Skill Snapshot
| name | qwen-tts |
| description | Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download. OpenClaw Skills integration. |
| owner | paki81 |
| repository | paki81/qwen-tts |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @paki81/qwen-tts |
| last updated | Feb 7, 2026 |
Maintainer

name: qwen-tts description: Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
Qwen TTS
Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.
Quick Start
Generate speech from text:
scripts/tts.py "Ciao, come va?" -l Italian -o output.wav
With voice instruction (emotion/style):
scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav
Different speaker:
scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav
Installation
First-time setup (one-time):
cd skills/public/qwen-tts
bash scripts/setup.sh
This creates a local virtual environment and installs qwen-tts package (~500MB).
Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.
Usage
scripts/tts.py [options] "Text to speak"
Options
-o, --output PATH- Output file path (default: qwen_output.wav)-s, --speaker NAME- Speaker voice (default: Vivian)-l, --language LANG- Language (default: Auto)-i, --instruct TEXT- Voice instruction (emotion, style, tone)--list-speakers- Show available speakers--model NAME- Model name (default: CustomVoice 1.7B)
Examples
Basic Italian speech:
scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav
With emotion/instruction:
scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav
Different speaker:
scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav
List available speakers:
scripts/tts.py --list-speakers
Available Speakers
The CustomVoice model includes 9 premium voices:
| Speaker | Language | Description |
|---|---|---|
| Vivian | Chinese | Bright, slightly edgy young female |
| Serena | Chinese | Warm, gentle young female |
| Uncle_Fu | Chinese | Seasoned male, low mellow timbre |
| Dylan | Chinese (Beijing) | Youthful Beijing male, clear |
| Eric | Chinese (Sichuan) | Lively Chengdu male, husky |
| Ryan | English | Dynamic male, rhythmic |
| Aiden | English | Sunny American male |
| Ono_Anna | Japanese | Playful female, light nimble |
| Sohee | Korean | Warm female, rich emotion |
Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).
Voice Instructions
Use -i, --instruct to control emotion, tone, and style:
Italian examples:
"Parla con entusiasmo""Tono serio e professionale""Voce calma e rilassante""Leggi come un narratore"
English examples:
"Speak with excitement""Very happy and energetic""Calm and soothing voice""Read like a narrator"
Integration with OpenClaw
The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:
# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav
Performance
- GPU (CUDA): ~1-3 seconds for short phrases
- CPU: ~10-30 seconds for short phrases
- Model size: ~1.7GB (auto-downloads on first run)
- Venv size: ~500MB (installed dependencies)
Troubleshooting
Setup fails:
# Ensure Python 3.10-3.12 is available
python3.12 --version
# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh
Model download slow/fails:
# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav
Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.
Audio quality issues:
- Try different speaker:
--list-speakers - Add instruction:
-i "Speak clearly and slowly" - Check language matches text:
-l Italianfor Italian text
Model Details
- Model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
- Source: Hugging Face (https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)
- License: Check model card for current license terms
- Sample Rate: 16kHz
- Output Format: WAV (uncompressed)
Qwen3-TTS Skill
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice model.
Installation
cd /home/brewuser/.nvm/versions/node/v24.13.0/lib/node_modules/clawdbot/skills/public/qwen-tts
bash scripts/setup.sh
This will:
- Create a Python 3.12 virtual environment in
./venv - Install
qwen-ttspackage and dependencies (~500MB) - First synthesis auto-downloads ~1.7GB model
Quick Test
scripts/tts.py "Ciao, questo è un test!" -l Italian -o test.wav
Play the audio:
aplay test.wav # Linux
# or
ffplay test.wav # Cross-platform
Usage
See SKILL.md for complete documentation.
Basic:
scripts/tts.py "Your text" -l Italian -o output.wav
List speakers:
scripts/tts.py --list-speakers
With emotion:
scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian
Integration with OpenClaw
The skill is automatically available to OpenClaw once installed. OpenClaw can call:
cd skills/public/qwen-tts && scripts/tts.py "Text" -l Italian -o /tmp/audio.wav
Output path is printed to stdout (last line).
Requirements
- Python 3.10-3.12 (tested with 3.12)
- ~2.2GB disk space (500MB venv + 1.7GB model)
- GPU recommended (CPU works but slower)
License
Uses Qwen3-TTS under Apache 2.0 license. Check model card for details: https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
- OpenClaw CLI installed and configured.
- Language: Markdown
- License: MIT
- Topics:
FAQ
How do I install qwen-tts?
Run openclaw add @paki81/qwen-tts in your terminal. This installs qwen-tts into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/paki81/qwen-tts. Review commits and README documentation before installing.
