skills$openclaw/universal-voice-agent
snail3d3.4k

by snail3d

universal-voice-agent – OpenClaw Skill

universal-voice-agent is an OpenClaw Skills integration for coding workflows. Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.

3.4k stars4.0k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

nameuniversal-voice-agent
descriptionReal-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning. OpenClaw Skills integration.
ownersnail3d
repositorysnail3d/voice-devotionalpath: universal-voice-agent
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @snail3d/voice-devotional:universal-voice-agent
last updatedFeb 7, 2026

Maintainer

snail3d

snail3d

Maintains universal-voice-agent in the OpenClaw Skills directory.

View GitHub profile
File Explorer
12 files
universal-voice-agent
references
ARCHITECTURE.md
5.3 KB
WEBSOCKET_SETUP.md
5.6 KB
scripts
agent.js
7.2 KB
websocket-server.js
14.3 KB
package-lock.json
44.2 KB
package.json
404 B
README.md
2.9 KB
run.sh
652 B
SETUP.md
4.5 KB
SKILL.md
4.4 KB
SKILL.md

name: universal-voice-agent description: Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.

Universal Voice Agent

Make phone calls to achieve any goal. Haiku converses naturally, adapts in real-time, and handles the entire flow autonomously.

Quick Start

Make a Call

universal-voice-agent call \
  --goal "Order 2 large pepperonis for pickup at 6pm" \
  --phone "+1-555-123-4567" \
  --notify-to "+1-555-730-8926"

Haiku:

  1. Calls the number via Twilio
  2. Listens to responses (Groq Whisper transcription)
  3. Thinks about next move toward goal (Claude Haiku real-time reasoning)
  4. Speaks in your voice (ElevenLabs TTS)
  5. Repeats until goal achieved or timeout
  6. Sends SMS summary to you

Examples

Ordering:

universal-voice-agent call \
  --goal "Order 2 large pepperonis for pickup at 6pm" \
  --phone "+1-555-123-4567"

Customer Service:

universal-voice-agent call \
  --goal "Find out the cancellation policy and confirm my appointment" \
  --phone "+1-555-987-6543"

Encouragement:

universal-voice-agent call \
  --goal "Call John and encourage him about his recent wins" \
  --phone "+1-555-555-5555"

Support:

universal-voice-agent call \
  --goal "Get a refund for order #12345" \
  --phone "+1-800-123-4567"

How It Works

Real-Time Voice Loop

Goal: "Order 2 large pepperonis"
Phone: 555-123-4567

[DIAL]
  ↓
[LISTEN] (Groq Whisper) → "Hi, Mario's Pizza!"
  ↓
[THINK] (Haiku) → "They answered, now I'll state my order"
  ↓
[SPEAK] (ElevenLabs) → "Hi! I'd like to order 2 large pepperonis..."
  ↓
[LISTEN] (Groq Whisper) → "Sure, what else?"
  ↓
[THINK] (Haiku) → "They're ready. I should give toppings and details."
  ↓
[SPEAK] (ElevenLabs) → "No onions, and pickup at 6pm"
  ↓
[LISTEN] (Groq Whisper) → "$35, see you at 6."
  ↓
[THINK] (Haiku) → "Goal achieved! Order confirmed."
  ↓
[SEND SMS] → "✅ Order placed: 2 large pepperoni, pickup 6pm, $35"
  ↓
[HANGUP]

Key Features

Real-time Reasoning:

  • Haiku gets full conversation history
  • Decides next response based on context, not scripts
  • Adapts to unexpected responses naturally

Silence Handling:

  • Detects when other party goes silent (on hold, put down phone, etc)
  • After 5 seconds of silence: waits
  • After 10 seconds: asks "Hello? Are you still there?"
  • After 5 minutes: hangs up intelligently

Natural Pacing:

  • Response latency <2 seconds (Haiku is fast)
  • Speaks at human pace (ElevenLabs)
  • Pauses for listening naturally

Smart Timeout:

  • Conversation timeout: 20 minutes max
  • Hold timeout: 5 minutes max
  • Asks "Is anyone there?" before giving up

SMS Summary:

  • After call ends, sends you a text with:
    • Status (✅ ✅ with issues, ❌ failed)
    • Brief recap of what happened
    • Key confirmations/details
    • Call duration

Configuration

Goal Definition

Any natural language phrase:

  • "Order 2 large pepperonis for pickup at 6pm"
  • "Find out the cancellation policy"
  • "Call John and encourage him"
  • "Get a refund for order #12345"

Haiku interprets the goal and adapts the conversation to achieve it.

Phone Number

E.164 format: +1-555-123-4567 or just 555-123-4567

Optional Context

universal-voice-agent call \
  --goal "Order 2 large pepperonis" \
  --phone "555-123-4567" \
  --context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \
  --notify-to "555-730-8926"

Scripts

  • agent.js - Main orchestrator, Twilio loop, SMS summary
  • transcriber.js - Groq Whisper transcription pipeline
  • thinker.js - Haiku reasoning engine
  • speaker.js - ElevenLabs TTS output
  • silence-handler.js - Detect holds, silence, timeout

References

  • ARCHITECTURE.md - Real-time voice loop design
  • LATENCY.md - Optimization for sub-2s response times

Credentials Required

  • Twilio: Account SID, Auth Token, phone number
  • ElevenLabs: API key + your voice ID
  • Groq: API key for Whisper transcription
  • Claude API or Clawdbot Gateway: For Haiku reasoning

Store in environment or TOOLS.md.

README.md

Universal Voice Agent

Goal-oriented calling system using real-time voice streaming, AI reasoning, and natural language processing.

Quick Start

1. Start the WebSocket Server

cd /Users/ericwoodard/clawd/universal-voice-agent
node scripts/websocket-server.js

2. Expose to Internet (ngrok)

In another terminal:

ngrok http 5000

Copy the ngrok URL (e.g., https://abc123.ngrok.io)

3. Update Twilio

Go to: Twilio Console → Phone Numbers → Your Number → Voice Configuration

Set Webhook URL to: https://abc123.ngrok.io/call-webhook

4. Make a Call

curl -X POST http://localhost:5000/make-call \
  -H "Content-Type: application/json" \
  -d '{
    "phoneNumber": "+1-555-123-4567",
    "goal": "Order 2 large pepperoni pizzas for pickup at 6pm"
  }'

How It Works

  1. You call: "Goal: Order pizza"
  2. Twilio dials the restaurant
  3. WebSocket connects your server to the call
  4. Real-time loop:
    • 🔊 Listen (Groq Whisper transcribes their response)
    • 🤖 Think (Haiku reasons about what to say next)
    • 🎤 Speak (ElevenLabs generates audio in your voice)
  5. Repeats until goal achieved or timeout
  6. SMS summary sent to you with results

Files

  • SKILL.md - Full skill documentation
  • scripts/websocket-server.js - Main WebSocket server (handles real-time audio)
  • scripts/agent.js - Older agent (simulator, for reference)
  • references/WEBSOCKET_SETUP.md - Detailed setup guide
  • references/ARCHITECTURE.md - System architecture

Features

✅ Real-time voice streaming (WebSocket) ✅ Automatic speech-to-text (Groq Whisper) ✅ AI reasoning (Claude Haiku) ✅ Natural speech generation (ElevenLabs in your voice) ✅ Silence detection & intelligent timeout handling ✅ Goal-oriented conversation (not scripted) ✅ SMS summary after calls ✅ Works for: ordering, customer service, reservations, encouragement, etc.

Configuration

All credentials in environment or code:

  • TWILIO_ACCOUNT_SID - Your Twilio account SID
  • TWILIO_AUTH_TOKEN - Your Twilio auth token
  • TWILIO_PHONE - Your Twilio phone number
  • GROQ_API_KEY - Groq API key for Whisper transcription
  • ELEVENLABS_API_KEY - ElevenLabs API key for TTS

Next Steps

  1. Integrate Groq Whisper in processAudio() to transcribe incoming audio
  2. Integrate Haiku to generate responses based on goal + history
  3. Integrate ElevenLabs to convert responses to audio
  4. Test with real calls

See references/WEBSOCKET_SETUP.md for detailed implementation guide.

Troubleshooting

WebSocket won't connect?

  • Check ngrok is running and webhook URL matches

No audio coming through?

  • Check Groq API key is valid
  • Verify audio payload in WebSocket messages

Audio not playing back?

  • Check ElevenLabs integration
  • Verify audio codec matches Twilio's format

See references/WEBSOCKET_SETUP.md for more troubleshooting.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

  • OpenClaw CLI installed and configured.
  • Language: Markdown
  • License: MIT
  • Topics:

Configuration

### Goal Definition Any natural language phrase: - "Order 2 large pepperonis for pickup at 6pm" - "Find out the cancellation policy" - "Call John and encourage him" - "Get a refund for order #12345" Haiku interprets the goal and adapts the conversation to achieve it. ### Phone Number E.164 format: `+1-555-123-4567` or just `555-123-4567` ### Optional Context ```bash universal-voice-agent call \ --goal "Order 2 large pepperonis" \ --phone "555-123-4567" \ --context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \ --notify-to "555-730-8926" ```

FAQ

How do I install universal-voice-agent?

Run openclaw add @snail3d/voice-devotional:universal-voice-agent in your terminal. This installs universal-voice-agent into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/snail3d/voice-devotional. Review commits and README documentation before installing.