3.6k★by kesslerio
phone-agent – OpenClaw Skill
phone-agent is an OpenClaw Skills integration for ai ml workflows. Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.
Skill Snapshot
| name | phone-agent |
| description | Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot. OpenClaw Skills integration. |
| owner | kesslerio |
| repository | kesslerio/phone-agent |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @kesslerio/phone-agent |
| last updated | Feb 7, 2026 |
Maintainer

name: phone-agent description: "Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot."
Phone Agent Skill
Runs a local FastAPI server that acts as a real-time voice bridge.
Architecture
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> ElevenLabs (TTS)
Prerequisites
- Twilio Account: Phone number + TwiML App.
- Deepgram API Key: For fast speech-to-text.
- OpenAI API Key: For the conversation logic.
- ElevenLabs API Key: For realistic text-to-speech.
- Ngrok (or similar): To expose your local port 8080 to Twilio.
Setup
-
Install Dependencies:
pip install -r scripts/requirements.txt -
Set Environment Variables (in
~/.moltbot/.env,~/.clawdbot/.env, or export):export DEEPGRAM_API_KEY="your_key" export OPENAI_API_KEY="your_key" export ELEVENLABS_API_KEY="your_key" export TWILIO_ACCOUNT_SID="your_sid" export TWILIO_AUTH_TOKEN="your_token" export PORT=8080 -
Start the Server:
python3 scripts/server.py -
Expose to Internet:
ngrok http 8080 -
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL:
https://<your-ngrok-url>.ngrok.io/incoming - Method:
POST
Usage
Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
Customization
- System Prompt: Edit
SYSTEM_PROMPTinscripts/server.pyto change the persona. - Voice: Change
ELEVENLABS_VOICE_IDto use different voices. - Model: Switch
gpt-4o-minitogpt-4for smarter (but slower) responses.
Phone Agent Moltbot Skill
A real-time AI voice agent that handles incoming phone calls using Twilio, transcribes speech with Deepgram, generates responses via OpenAI, and speaks back with ElevenLabs text-to-speech.
Features
- Real-time Voice Processing: Handles incoming Twilio calls with low-latency WebSocket audio
- Automatic Speech Recognition: Deepgram for fast, accurate transcription
- AI-Powered Responses: OpenAI GPT for intelligent conversation
- Natural Speech Output: ElevenLabs for realistic, streaming TTS
- Task-Based Automation: Configurable task definitions for specific agent behaviors
- Recording & Logging: Automatic call recording and conversation logs
Architecture
Incoming Call (Twilio Phone)
|
v
Twilio WebSocket (Audio Stream)
|
+---> Local FastAPI Server
| |
| +---> Deepgram (Speech-to-Text)
| |
| +---> OpenAI (LLM/Intelligence)
| |
| +---> ElevenLabs (Text-to-Speech)
| |
+---------- (Audio Response)
|
Phone Speaker Output
Prerequisites
Before you begin, ensure you have:
-
Twilio Account
- Active Twilio account with a phone number
- TwiML App configured
- Account SID and Auth Token
-
API Keys (free tier available for all)
- Deepgram API Key (https://console.deepgram.com/)
- OpenAI API Key (https://platform.openai.com/api-keys)
- ElevenLabs API Key (https://elevenlabs.io/)
-
Local Network Access
- Ngrok or similar tool to expose localhost to the internet
- Ability to accept incoming webhooks from Twilio
-
Python 3.9+ and pip
Installation
# Clone the repository
git clone https://github.com/kesslerio/phone-agent-moltbot-skill.git
cd phone-agent-moltbot-skill
# Install dependencies
pip install -r scripts/requirements.txt
Configuration
Set Environment Variables
Create a .env file or set environment variables:
# API Keys (required)
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"
export ELEVENLABS_API_KEY="your-elevenlabs-key"
# Twilio (required)
export TWILIO_ACCOUNT_SID="your-account-sid"
export TWILIO_AUTH_TOKEN="your-auth-token"
export TWILIO_PHONE_NUMBER="+18665515246" # Your Twilio number
# Server (optional)
export PORT=8080
export PUBLIC_URL="https://your-ngrok-url.ngrok.io" # For webhooks
# Voice Customization (optional)
export ELEVENLABS_VOICE_ID="onwK4e9ZLuTAKqWW03F9" # Daniel voice
Or add to ~/.moltbot/.env or ~/.clawdbot/.env:
DEEPGRAM_API_KEY=your-key
OPENAI_API_KEY=your-key
ELEVENLABS_API_KEY=your-key
TWILIO_ACCOUNT_SID=your-sid
TWILIO_AUTH_TOKEN=your-token
TWILIO_PHONE_NUMBER=+1...
Startup & Configuration
1. Start the Local Server
python3 scripts/server.py
The server will start on http://localhost:8080 by default.
2. Expose to Internet with Ngrok
In another terminal:
ngrok http 8080
Note the HTTPS URL (e.g., https://abc123.ngrok.io)
3. Configure Twilio Webhook
In Twilio Console:
- Go to Phone Numbers → Your number
- Under Voice & Fax:
- Set "A Call Comes In" to Webhook
- URL:
https://<your-ngrok-url>.ngrok.io/incoming - Method:
POST
- Save
4. Test Incoming Calls
Call your Twilio number. The agent will:
- Answer and greet you
- Listen to your speech
- Transcribe your words
- Generate a response via OpenAI
- Speak the response back to you
Customization
Change Agent Persona
Edit SYSTEM_PROMPT in scripts/server.py:
SYSTEM_PROMPT = """You are a helpful customer service agent. Be friendly, concise, and professional."""
Change Voice
Set a different ElevenLabs voice ID:
export ELEVENLABS_VOICE_ID="g1r0eKKcGkk7Ep0RVcVn" # Callum voice
Available ElevenLabs voices: https://elevenlabs.io/docs/getting-started/voices
Use Different Model
Edit scripts/server.py and change the OpenAI model:
response = await client.chat.completions.create(
model="gpt-4", # or "gpt-4-turbo" for faster responses
messages=messages,
)
Task-Based Behaviors
Create YAML task definitions in the tasks/ directory:
name: book_restaurant
description: "Help the user book a restaurant reservation"
system_prompt: "You are a friendly restaurant reservation assistant..."
actions:
- confirm_date
- confirm_time
- confirm_party_size
- book_reservation
Integration with Moltbot
Add this skill to your Moltbot configuration:
{
"skills": [
{
"name": "phone-agent",
"path": "/path/to/phone-agent-moltbot-skill",
"enabled": true
}
]
}
Then reference it in workflows:
- "Set up an incoming voice agent"
- "Configure a customer service chatbot"
- "Test voice AI capabilities"
Project Structure
phone-agent-moltbot-skill/
├── scripts/
│ ├── server.py # Main FastAPI server
│ ├── server_realtime.py # Realtime processing variant
│ ├── requirements.txt # Python dependencies
│ └── typing_sound.raw # Typing sound effect
├── tasks/
│ ├── book_restaurant.yaml # Example task definitions
│ └── get_quote.yaml # Example task definitions
├── calls/ # Recording storage directory
├── references/ # Supporting documentation
├── SKILL.md # Moltbot skill manifest
├── README.md # This file
└── LICENSE # MIT License
Troubleshooting
Server Won't Start
- Check Python version:
python3 --version(requires 3.9+) - Install dependencies:
pip install -r scripts/requirements.txt - Check PORT variable:
echo $PORT(should be 8080 or set value)
Twilio Webhook Not Connecting
- Verify ngrok is running and the URL matches your Twilio webhook
- Check server logs:
python3 scripts/server.py(should show incoming requests) - Test ngrok tunnel:
curl https://<your-ngrok-url>.ngrok.io/health
Poor Transcription Quality
- Ensure DEEPGRAM_API_KEY is valid
- Check microphone/audio quality on the calling phone
- Deepgram is very accurate; poor results indicate audio issues
Slow Responses
- OpenAI API latency varies; gpt-4o-mini is fast and cheap
- Switch to "gpt-3.5-turbo" for faster responses (less capable)
- Increase timeout in websocket settings if needed
Voice Not Speaking
- Verify ELEVENLABS_API_KEY is valid
- Check voice ID is correct: https://elevenlabs.io/docs/api-reference/voices
- Confirm audio is not muted on the receiving phone
API Reference
Incoming Call Webhook
POST /incoming
Twilio sends call information to this endpoint. The server responds with TwiML to establish WebSocket connection.
WebSocket Audio Stream
WS /ws
Bidirectional audio stream for incoming call processing.
Health Check
GET /health
Returns {"status": "ok"} if the server is running.
Performance & Scaling
Current implementation handles:
- Single concurrent call per server instance
- ~100ms RTT for transcription + LLM + TTS
- Suitable for demo/testing, hobby projects, and low-volume use
For production:
- Run multiple server instances behind a load balancer
- Use Twilio's call queuing
- Implement connection pooling for API clients
- Consider dedicated hardware for Deepgram/ElevenLabs processing
Deployment Options
Local Development
python3 scripts/server.py
ngrok http 8080
Docker
FROM python:3.11-slim
WORKDIR /app
COPY scripts/requirements.txt .
RUN pip install -r requirements.txt
COPY scripts/ .
CMD ["python3", "server.py"]
Build and run:
docker build -t phone-agent .
docker run -p 8080:8080 \
-e DEEPGRAM_API_KEY="..." \
-e OPENAI_API_KEY="..." \
-e ELEVENLABS_API_KEY="..." \
-e TWILIO_ACCOUNT_SID="..." \
-e TWILIO_AUTH_TOKEN="..." \
phone-agent
Cloud Deployment
- Heroku: Add
Procfile→web: python3 scripts/server.py - Railway.app: Auto-detects Python and builds
- AWS Lambda: Use WebSocket API Gateway + Lambda
- Google Cloud Run: Containerize and deploy
License
MIT
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Test thoroughly
- Submit a pull request
Support
- MCP Server: Deepgram | OpenAI | ElevenLabs
- Twilio Docs: Voice API
- Moltbot: Documentation
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
1. **Twilio Account**: Phone number + TwiML App. 2. **Deepgram API Key**: For fast speech-to-text. 3. **OpenAI API Key**: For the conversation logic. 4. **ElevenLabs API Key**: For realistic text-to-speech. 5. **Ngrok** (or similar): To expose your local port 8080 to Twilio.
FAQ
How do I install phone-agent?
Run openclaw add @kesslerio/phone-agent in your terminal. This installs phone-agent into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/kesslerio/phone-agent. Review commits and README documentation before installing.
