7.5k★elevenlabs-stt – OpenClaw Skill
elevenlabs-stt is an OpenClaw Skills integration for coding workflows. Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Skill Snapshot
| name | elevenlabs-stt |
| description | Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2). OpenClaw Skills integration. |
| owner | clawdbotborges |
| repository | clawdbotborges/elevenlabs-stt |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @clawdbotborges/elevenlabs-stt |
| last updated | Feb 7, 2026 |
Maintainer

name: elevenlabs-stt description: Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2). homepage: https://elevenlabs.io/speech-to-text metadata: {"clawdbot":{"emoji":"🎙️","requires":{"bins":["curl"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
ElevenLabs Speech-to-Text
Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.
Quick Start
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3
# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize
# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en
# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
Options
| Flag | Description |
|---|---|
--diarize | Identify different speakers |
--lang CODE | ISO language code (e.g., en, pt, es) |
--json | Output full JSON with word timestamps |
--events | Tag audio events (laughter, music, etc.) |
Supported Formats
All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.
API Key
Set ELEVENLABS_API_KEY environment variable, or configure in clawdbot.json:
{
skills: {
entries: {
"elevenlabs-stt": {
apiKey: "sk_..."
}
}
}
}
Examples
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en
# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
🎙️ ElevenLabs Speech-to-Text Skill
A Clawdbot skill for transcribing audio files using ElevenLabs' Scribe v2 model.
Features
- 🌍 90+ languages supported with automatic detection
- 👥 Speaker diarization — identify different speakers
- 🎵 Audio event tagging — detect laughter, music, applause, etc.
- 📝 Word-level timestamps — precise timing in JSON output
- 🎧 All major formats — mp3, m4a, wav, ogg, webm, mp4, and more
Installation
For Clawdbot
Add to your clawdbot.json:
{
skills: {
entries: {
"elevenlabs-stt": {
source: "github:clawdbotborges/elevenlabs-stt",
apiKey: "sk_your_api_key_here"
}
}
}
}
Standalone
git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"
Usage
# Basic transcription
./scripts/transcribe.sh audio.mp3
# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize
# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en
# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json
# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events
Options
| Flag | Description |
|---|---|
--diarize | Enable speaker diarization |
--lang CODE | ISO language code (e.g., en, pt, es, fr) |
--json | Output full JSON response with word timestamps |
--events | Tag audio events like laughter, music, applause |
-h, --help | Show help message |
Examples
Transcribe a voice message
./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."
Meeting with multiple speakers
./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
{
"text": "Welcome everyone. Let's start with updates.",
"words": [
{"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
{"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
]
}
Process with jq
# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'
# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'
Requirements
curl— for API requestsjq— for JSON parsing (optional, but recommended)- ElevenLabs API key with Speech-to-Text access
API Key
Get your API key from ElevenLabs:
- Sign up or log in
- Go to Profile → API Keys
- Create a new key or copy existing one
License
MIT
Links
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
- OpenClaw CLI installed and configured.
- Language: Markdown
- License: MIT
- Topics:
FAQ
How do I install elevenlabs-stt?
Run openclaw add @clawdbotborges/elevenlabs-stt in your terminal. This installs elevenlabs-stt into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/clawdbotborges/elevenlabs-stt. Review commits and README documentation before installing.
