3.4k★by snail3d
universal-voice-agent – OpenClaw Skill
universal-voice-agent is an OpenClaw Skills integration for coding workflows. Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.
Skill Snapshot
| name | universal-voice-agent |
| description | Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning. OpenClaw Skills integration. |
| owner | snail3d |
| repository | snail3d/voice-devotionalpath: universal-voice-agent |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @snail3d/voice-devotional:universal-voice-agent |
| last updated | Feb 7, 2026 |
Maintainer

name: universal-voice-agent description: Real-time goal-oriented voice calling agent. Use when you need to make phone calls with a specific objective: place orders, make reservations, customer service, encouragement calls, or any conversational goal. Haiku runs the call in real-time with your voice (ElevenLabs), transcribes responses (Groq Whisper), adapts intelligently to conversation flow, handles silence/holds, and sends you an SMS summary. No scripts, pure real-time reasoning.
Universal Voice Agent
Make phone calls to achieve any goal. Haiku converses naturally, adapts in real-time, and handles the entire flow autonomously.
Quick Start
Make a Call
universal-voice-agent call \
--goal "Order 2 large pepperonis for pickup at 6pm" \
--phone "+1-555-123-4567" \
--notify-to "+1-555-730-8926"
Haiku:
- Calls the number via Twilio
- Listens to responses (Groq Whisper transcription)
- Thinks about next move toward goal (Claude Haiku real-time reasoning)
- Speaks in your voice (ElevenLabs TTS)
- Repeats until goal achieved or timeout
- Sends SMS summary to you
Examples
Ordering:
universal-voice-agent call \
--goal "Order 2 large pepperonis for pickup at 6pm" \
--phone "+1-555-123-4567"
Customer Service:
universal-voice-agent call \
--goal "Find out the cancellation policy and confirm my appointment" \
--phone "+1-555-987-6543"
Encouragement:
universal-voice-agent call \
--goal "Call John and encourage him about his recent wins" \
--phone "+1-555-555-5555"
Support:
universal-voice-agent call \
--goal "Get a refund for order #12345" \
--phone "+1-800-123-4567"
How It Works
Real-Time Voice Loop
Goal: "Order 2 large pepperonis"
Phone: 555-123-4567
[DIAL]
↓
[LISTEN] (Groq Whisper) → "Hi, Mario's Pizza!"
↓
[THINK] (Haiku) → "They answered, now I'll state my order"
↓
[SPEAK] (ElevenLabs) → "Hi! I'd like to order 2 large pepperonis..."
↓
[LISTEN] (Groq Whisper) → "Sure, what else?"
↓
[THINK] (Haiku) → "They're ready. I should give toppings and details."
↓
[SPEAK] (ElevenLabs) → "No onions, and pickup at 6pm"
↓
[LISTEN] (Groq Whisper) → "$35, see you at 6."
↓
[THINK] (Haiku) → "Goal achieved! Order confirmed."
↓
[SEND SMS] → "✅ Order placed: 2 large pepperoni, pickup 6pm, $35"
↓
[HANGUP]
Key Features
Real-time Reasoning:
- Haiku gets full conversation history
- Decides next response based on context, not scripts
- Adapts to unexpected responses naturally
Silence Handling:
- Detects when other party goes silent (on hold, put down phone, etc)
- After 5 seconds of silence: waits
- After 10 seconds: asks "Hello? Are you still there?"
- After 5 minutes: hangs up intelligently
Natural Pacing:
- Response latency <2 seconds (Haiku is fast)
- Speaks at human pace (ElevenLabs)
- Pauses for listening naturally
Smart Timeout:
- Conversation timeout: 20 minutes max
- Hold timeout: 5 minutes max
- Asks "Is anyone there?" before giving up
SMS Summary:
- After call ends, sends you a text with:
- Status (✅ ✅ with issues, ❌ failed)
- Brief recap of what happened
- Key confirmations/details
- Call duration
Configuration
Goal Definition
Any natural language phrase:
- "Order 2 large pepperonis for pickup at 6pm"
- "Find out the cancellation policy"
- "Call John and encourage him"
- "Get a refund for order #12345"
Haiku interprets the goal and adapts the conversation to achieve it.
Phone Number
E.164 format: +1-555-123-4567 or just 555-123-4567
Optional Context
universal-voice-agent call \
--goal "Order 2 large pepperonis" \
--phone "555-123-4567" \
--context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \
--notify-to "555-730-8926"
Scripts
- agent.js - Main orchestrator, Twilio loop, SMS summary
- transcriber.js - Groq Whisper transcription pipeline
- thinker.js - Haiku reasoning engine
- speaker.js - ElevenLabs TTS output
- silence-handler.js - Detect holds, silence, timeout
References
- ARCHITECTURE.md - Real-time voice loop design
- LATENCY.md - Optimization for sub-2s response times
Credentials Required
- Twilio: Account SID, Auth Token, phone number
- ElevenLabs: API key + your voice ID
- Groq: API key for Whisper transcription
- Claude API or Clawdbot Gateway: For Haiku reasoning
Store in environment or TOOLS.md.
Universal Voice Agent
Goal-oriented calling system using real-time voice streaming, AI reasoning, and natural language processing.
Quick Start
1. Start the WebSocket Server
cd /Users/ericwoodard/clawd/universal-voice-agent
node scripts/websocket-server.js
2. Expose to Internet (ngrok)
In another terminal:
ngrok http 5000
Copy the ngrok URL (e.g., https://abc123.ngrok.io)
3. Update Twilio
Go to: Twilio Console → Phone Numbers → Your Number → Voice Configuration
Set Webhook URL to: https://abc123.ngrok.io/call-webhook
4. Make a Call
curl -X POST http://localhost:5000/make-call \
-H "Content-Type: application/json" \
-d '{
"phoneNumber": "+1-555-123-4567",
"goal": "Order 2 large pepperoni pizzas for pickup at 6pm"
}'
How It Works
- You call:
"Goal: Order pizza" - Twilio dials the restaurant
- WebSocket connects your server to the call
- Real-time loop:
- 🔊 Listen (Groq Whisper transcribes their response)
- 🤖 Think (Haiku reasons about what to say next)
- 🎤 Speak (ElevenLabs generates audio in your voice)
- Repeats until goal achieved or timeout
- SMS summary sent to you with results
Files
SKILL.md- Full skill documentationscripts/websocket-server.js- Main WebSocket server (handles real-time audio)scripts/agent.js- Older agent (simulator, for reference)references/WEBSOCKET_SETUP.md- Detailed setup guidereferences/ARCHITECTURE.md- System architecture
Features
✅ Real-time voice streaming (WebSocket) ✅ Automatic speech-to-text (Groq Whisper) ✅ AI reasoning (Claude Haiku) ✅ Natural speech generation (ElevenLabs in your voice) ✅ Silence detection & intelligent timeout handling ✅ Goal-oriented conversation (not scripted) ✅ SMS summary after calls ✅ Works for: ordering, customer service, reservations, encouragement, etc.
Configuration
All credentials in environment or code:
TWILIO_ACCOUNT_SID- Your Twilio account SIDTWILIO_AUTH_TOKEN- Your Twilio auth tokenTWILIO_PHONE- Your Twilio phone numberGROQ_API_KEY- Groq API key for Whisper transcriptionELEVENLABS_API_KEY- ElevenLabs API key for TTS
Next Steps
- Integrate Groq Whisper in
processAudio()to transcribe incoming audio - Integrate Haiku to generate responses based on goal + history
- Integrate ElevenLabs to convert responses to audio
- Test with real calls
See references/WEBSOCKET_SETUP.md for detailed implementation guide.
Troubleshooting
WebSocket won't connect?
- Check ngrok is running and webhook URL matches
No audio coming through?
- Check Groq API key is valid
- Verify audio payload in WebSocket messages
Audio not playing back?
- Check ElevenLabs integration
- Verify audio codec matches Twilio's format
See references/WEBSOCKET_SETUP.md for more troubleshooting.
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
- OpenClaw CLI installed and configured.
- Language: Markdown
- License: MIT
- Topics:
Configuration
### Goal Definition Any natural language phrase: - "Order 2 large pepperonis for pickup at 6pm" - "Find out the cancellation policy" - "Call John and encourage him" - "Get a refund for order #12345" Haiku interprets the goal and adapts the conversation to achieve it. ### Phone Number E.164 format: `+1-555-123-4567` or just `555-123-4567` ### Optional Context ```bash universal-voice-agent call \ --goal "Order 2 large pepperonis" \ --phone "555-123-4567" \ --context "Restaurant: Mario's Pizza, Budget: $40, Dietary: no onions" \ --notify-to "555-730-8926" ```
FAQ
How do I install universal-voice-agent?
Run openclaw add @snail3d/voice-devotional:universal-voice-agent in your terminal. This installs universal-voice-agent into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/snail3d/voice-devotional. Review commits and README documentation before installing.
