skills$openclaw/elevenlabs-stt
clawdbotborges7.5k

by clawdbotborges

elevenlabs-stt – OpenClaw Skill

elevenlabs-stt is an OpenClaw Skills integration for coding workflows. Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

7.5k stars3.5k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

nameelevenlabs-stt
descriptionTranscribe audio files using ElevenLabs Speech-to-Text (Scribe v2). OpenClaw Skills integration.
ownerclawdbotborges
repositoryclawdbotborges/elevenlabs-stt
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @clawdbotborges/elevenlabs-stt
last updatedFeb 7, 2026

Maintainer

clawdbotborges

clawdbotborges

Maintains elevenlabs-stt in the OpenClaw Skills directory.

View GitHub profile
File Explorer
5 files
.
scripts
transcribe.sh
2.3 KB
_meta.json
299 B
README.md
2.9 KB
SKILL.md
1.7 KB
SKILL.md

name: elevenlabs-stt description: Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2). homepage: https://elevenlabs.io/speech-to-text metadata: {"clawdbot":{"emoji":"🎙️","requires":{"bins":["curl"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}

ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.

Quick Start

# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3

# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize

# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en

# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json

Options

FlagDescription
--diarizeIdentify different speakers
--lang CODEISO language code (e.g., en, pt, es)
--jsonOutput full JSON with word timestamps
--eventsTag audio events (laughter, music, etc.)

Supported Formats

All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.

API Key

Set ELEVENLABS_API_KEY environment variable, or configure in clawdbot.json:

{
  skills: {
    entries: {
      "elevenlabs-stt": {
        apiKey: "sk_..."
      }
    }
  }
}

Examples

# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg

# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en

# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
README.md

🎙️ ElevenLabs Speech-to-Text Skill

A Clawdbot skill for transcribing audio files using ElevenLabs' Scribe v2 model.

Features

  • 🌍 90+ languages supported with automatic detection
  • 👥 Speaker diarization — identify different speakers
  • 🎵 Audio event tagging — detect laughter, music, applause, etc.
  • 📝 Word-level timestamps — precise timing in JSON output
  • 🎧 All major formats — mp3, m4a, wav, ogg, webm, mp4, and more

Installation

For Clawdbot

Add to your clawdbot.json:

{
  skills: {
    entries: {
      "elevenlabs-stt": {
        source: "github:clawdbotborges/elevenlabs-stt",
        apiKey: "sk_your_api_key_here"
      }
    }
  }
}

Standalone

git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"

Usage

# Basic transcription
./scripts/transcribe.sh audio.mp3

# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize

# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en

# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json

# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events

Options

FlagDescription
--diarizeEnable speaker diarization
--lang CODEISO language code (e.g., en, pt, es, fr)
--jsonOutput full JSON response with word timestamps
--eventsTag audio events like laughter, music, applause
-h, --helpShow help message

Examples

Transcribe a voice message

./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."

Meeting with multiple speakers

./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
{
  "text": "Welcome everyone. Let's start with updates.",
  "words": [
    {"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
    {"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
  ]
}

Process with jq

# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'

# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'

Requirements

  • curl — for API requests
  • jq — for JSON parsing (optional, but recommended)
  • ElevenLabs API key with Speech-to-Text access

API Key

Get your API key from ElevenLabs:

  1. Sign up or log in
  2. Go to Profile → API Keys
  3. Create a new key or copy existing one

License

MIT

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

  • OpenClaw CLI installed and configured.
  • Language: Markdown
  • License: MIT
  • Topics:

FAQ

How do I install elevenlabs-stt?

Run openclaw add @clawdbotborges/elevenlabs-stt in your terminal. This installs elevenlabs-stt into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/clawdbotborges/elevenlabs-stt. Review commits and README documentation before installing.