skills$openclaw/universal-voice-agent

3.4k★

universal-voice-agent – OpenClaw Skill

Name: universal-voice-agent
Author: snail3d

universal-voice-agent is an OpenClaw Skills integration for coding workflows. Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.

3.4k stars4.0k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	universal-voice-agent
description	Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning. OpenClaw Skills integration.
owner	snail3d
repository	snail3d/voice-devotionalpath: universal-voice-agent
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @snail3d/voice-devotional:universal-voice-agent
last updated	Feb 7, 2026

Maintainer

snail3d

Maintains universal-voice-agent in the OpenClaw Skills directory.

View GitHub profile

File Explorer

12 files

universal-voice-agent

references

ARCHITECTURE.md

5.3 KB

WEBSOCKET_SETUP.md

5.6 KB

scripts

agent.js

7.2 KB

websocket-server.js

14.3 KB

package-lock.json

44.2 KB

package.json

404 B

README.md

2.9 KB

run.sh

652 B

SETUP.md

4.5 KB

SKILL.md

4.4 KB

SKILL.md

name: universal-voice-agent description: Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.

Universal Voice Agent

Make phone calls to achieve any goal. Haiku converses naturally, adapts in real-time, and handles the entire flow autonomously.

Quick Start

Make a Call

universal-voice-agent call \
  --goal "Order 2 large pepperonis for pickup at 6pm" \
  --phone "+1-555-123-4567" \
  --notify-to "+1-555-730-8926"

Haiku:

Calls the number via Twilio
Listens to responses (Groq Whisper transcription)
Thinks about next move toward goal (Claude Haiku real-time reasoning)
Speaks in your voice (ElevenLabs TTS)
Repeats until goal achieved or timeout
Sends SMS summary to you

Examples

Ordering:

universal-voice-agent call \
  --goal "Order 2 large pepperonis for pickup at 6pm" \
  --phone "+1-555-123-4567"

Customer Service:

universal-voice-agent call \
  --goal "Find out the cancellation policy and confirm my appointment" \
  --phone "+1-555-987-6543"

Encouragement:

universal-voice-agent call \
  --goal "Call John and encourage him about his recent wins" \
  --phone "+1-555-555-5555"

Support:

universal-voice-agent call \
  --goal "Get a refund for order #12345" \
  --phone "+1-800-123-4567"

How It Works

Real-Time Voice Loop

Goal: "Order 2 large pepperonis"
Phone: 555-123-4567

[DIAL]
  ↓
[LISTEN] (Groq Whisper) → "Hi, Mario's Pizza!"
  ↓
[THINK] (Haiku) → "They answered, now I'll state my order"
  ↓
[SPEAK] (ElevenLabs) → "Hi! I'd like to order 2 large pepperonis..."
  ↓
[LISTEN] (Groq Whisper) → "Sure, what else?"
  ↓
[THINK] (Haiku) → "They're ready. I should give toppings and details."
  ↓
[SPEAK] (ElevenLabs) → "No onions, and pickup at 6pm"
  ↓
[LISTEN] (Groq Whisper) → "$35, see you at 6."
  ↓
[THINK] (Haiku) → "Goal achieved! Order confirmed."
  ↓
[SEND SMS] → "✅ Order placed: 2 large pepperoni, pickup 6pm, $35"
  ↓
[HANGUP]

Key Features

Real-time Reasoning:

Haiku gets full conversation history
Decides next response based on context, not scripts
Adapts to unexpected responses naturally

Silence Handling:

Detects when other party goes silent (on hold, put down phone, etc)
After 5 seconds of silence: waits
After 10 seconds: asks "Hello? Are you still there?"
After 5 minutes: hangs up intelligently

Natural Pacing:

Response latency <2 seconds (Haiku is fast)
Speaks at human pace (ElevenLabs)
Pauses for listening naturally

Smart Timeout:

Conversation timeout: 20 minutes max
Hold timeout: 5 minutes max
Asks "Is anyone there?" before giving up

SMS Summary:

After call ends, sends you a text with:
- Status (✅ ✅ with issues, ❌ failed)
- Brief recap of what happened
- Key confirmations/details
- Call duration

Configuration

Goal Definition

Any natural language phrase:

"Order 2 large pepperonis for pickup at 6pm"
"Find out the cancellation policy"
"Call John and encourage him"
"Get a refund for order #12345"

Haiku interprets the goal and adapts the conversation to achieve it.

Phone Number

E.164 format: +1-555-123-4567 or just 555-123-4567

Optional Context

universal-voice-agent call \
  --goal "Order 2 large pepperonis" \
  --phone "555-123-4567" \
  --context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \
  --notify-to "555-730-8926"

Scripts

agent.js - Main orchestrator, Twilio loop, SMS summary
transcriber.js - Groq Whisper transcription pipeline
thinker.js - Haiku reasoning engine
speaker.js - ElevenLabs TTS output
silence-handler.js - Detect holds, silence, timeout

References

ARCHITECTURE.md - Real-time voice loop design
LATENCY.md - Optimization for sub-2s response times

Credentials Required

Twilio: Account SID, Auth Token, phone number
ElevenLabs: API key + your voice ID
Groq: API key for Whisper transcription
Claude API or Clawdbot Gateway: For Haiku reasoning

Store in environment or TOOLS.md.

README.md

Universal Voice Agent

Goal-oriented calling system using real-time voice streaming, AI reasoning, and natural language processing.

Quick Start

1. Start the WebSocket Server

cd /Users/ericwoodard/clawd/universal-voice-agent
node scripts/websocket-server.js

2. Expose to Internet (ngrok)

In another terminal:

ngrok http 5000

Copy the ngrok URL (e.g., https://abc123.ngrok.io)

3. Update Twilio

Go to: Twilio Console → Phone Numbers → Your Number → Voice Configuration

Set Webhook URL to: https://abc123.ngrok.io/call-webhook

4. Make a Call

curl -X POST http://localhost:5000/make-call \
  -H "Content-Type: application/json" \
  -d '{
    "phoneNumber": "+1-555-123-4567",
    "goal": "Order 2 large pepperoni pizzas for pickup at 6pm"
  }'

How It Works

You call: "Goal: Order pizza"
Twilio dials the restaurant
WebSocket connects your server to the call
Real-time loop:
- 🔊 Listen (Groq Whisper transcribes their response)
- 🤖 Think (Haiku reasons about what to say next)
- 🎤 Speak (ElevenLabs generates audio in your voice)
Repeats until goal achieved or timeout
SMS summary sent to you with results

Files

SKILL.md - Full skill documentation
scripts/websocket-server.js - Main WebSocket server (handles real-time audio)
scripts/agent.js - Older agent (simulator, for reference)
references/WEBSOCKET_SETUP.md - Detailed setup guide
references/ARCHITECTURE.md - System architecture

Features

✅ Real-time voice streaming (WebSocket) ✅ Automatic speech-to-text (Groq Whisper) ✅ AI reasoning (Claude Haiku) ✅ Natural speech generation (ElevenLabs in your voice) ✅ Silence detection & intelligent timeout handling ✅ Goal-oriented conversation (not scripted) ✅ SMS summary after calls ✅ Works for: ordering, customer service, reservations, encouragement, etc.

Configuration

All credentials in environment or code:

TWILIO_ACCOUNT_SID - Your Twilio account SID
TWILIO_AUTH_TOKEN - Your Twilio auth token
TWILIO_PHONE - Your Twilio phone number
GROQ_API_KEY - Groq API key for Whisper transcription
ELEVENLABS_API_KEY - ElevenLabs API key for TTS

Next Steps

Integrate Groq Whisper in processAudio() to transcribe incoming audio
Integrate Haiku to generate responses based on goal + history
Integrate ElevenLabs to convert responses to audio
Test with real calls

See references/WEBSOCKET_SETUP.md for detailed implementation guide.

Troubleshooting

WebSocket won't connect?

Check ngrok is running and webhook URL matches

No audio coming through?

Check Groq API key is valid
Verify audio payload in WebSocket messages

Audio not playing back?

Check ElevenLabs integration
Verify audio codec matches Twilio's format

See references/WEBSOCKET_SETUP.md for more troubleshooting.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

OpenClaw CLI installed and configured.
Language: Markdown
License: MIT
Topics:

Configuration

### Goal Definition Any natural language phrase: - "Order 2 large pepperonis for pickup at 6pm" - "Find out the cancellation policy" - "Call John and encourage him" - "Get a refund for order #12345" Haiku interprets the goal and adapts the conversation to achieve it. ### Phone Number E.164 format: `+1-555-123-4567` or just `555-123-4567` ### Optional Context ```bash universal-voice-agent call \ --goal "Order 2 large pepperonis" \ --phone "555-123-4567" \ --context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \ --notify-to "555-730-8926" ```

FAQ

How do I install universal-voice-agent?

Run openclaw add @snail3d/voice-devotional:universal-voice-agent in your terminal. This installs universal-voice-agent into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/snail3d/voice-devotional. Review commits and README documentation before installing.