skills$openclaw/phone-agent

3.6k★

phone-agent – OpenClaw Skill

Name: phone-agent
Author: kesslerio

phone-agent is an OpenClaw Skills integration for ai ml workflows. Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

3.6k stars9.1k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026ai ml

Skill Snapshot

name	phone-agent
description	Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot. OpenClaw Skills integration.
owner	kesslerio
repository	kesslerio/phone-agent
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @kesslerio/phone-agent
last updated	Feb 7, 2026

Maintainer

kesslerio

Maintains phone-agent in the OpenClaw Skills directory.

View GitHub profile

File Explorer

10 files

scripts

requirements.txt

111 B

server_realtime.py

8.9 KB

server.py

28.2 KB

tasks

book_restaurant.yaml

1.0 KB

get_quote.yaml

1.0 KB

_meta.json

283 B

README.md

8.6 KB

SKILL.md

2.2 KB

SKILL.md

name: phone-agent description: "Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot."

Phone Agent Skill

Runs a local FastAPI server that acts as a real-time voice bridge.

Architecture

Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
                                                  |
                                                  +--> OpenAI (LLM)
                                                  +--> ElevenLabs (TTS)

Prerequisites

Twilio Account: Phone number + TwiML App.
Deepgram API Key: For fast speech-to-text.
OpenAI API Key: For the conversation logic.
ElevenLabs API Key: For realistic text-to-speech.
Ngrok (or similar): To expose your local port 8080 to Twilio.

Setup

Install Dependencies:

pip install -r scripts/requirements.txt

Set Environment Variables (in ~/.moltbot/.env, ~/.clawdbot/.env, or export):

export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ELEVENLABS_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080

Start the Server:
```
python3 scripts/server.py
```
Expose to Internet:
```
ngrok http 8080
```
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL: https://<your-ngrok-url>.ngrok.io/incoming
- Method: POST

Usage

Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.

Customization

System Prompt: Edit SYSTEM_PROMPT in scripts/server.py to change the persona.
Voice: Change ELEVENLABS_VOICE_ID to use different voices.
Model: Switch gpt-4o-mini to gpt-4 for smarter (but slower) responses.

README.md

Phone Agent Moltbot Skill

A real-time AI voice agent that handles incoming phone calls using Twilio, transcribes speech with Deepgram, generates responses via OpenAI, and speaks back with ElevenLabs text-to-speech.

Features

Real-time Voice Processing: Handles incoming Twilio calls with low-latency WebSocket audio
Automatic Speech Recognition: Deepgram for fast, accurate transcription
AI-Powered Responses: OpenAI GPT for intelligent conversation
Natural Speech Output: ElevenLabs for realistic, streaming TTS
Task-Based Automation: Configurable task definitions for specific agent behaviors
Recording & Logging: Automatic call recording and conversation logs

Architecture

Incoming Call (Twilio Phone)
         |
         v
  Twilio WebSocket (Audio Stream)
         |
         +---> Local FastAPI Server
         |           |
         |           +---> Deepgram (Speech-to-Text)
         |           |
         |           +---> OpenAI (LLM/Intelligence)
         |           |
         |           +---> ElevenLabs (Text-to-Speech)
         |           |
         +---------- (Audio Response)
         |
    Phone Speaker Output

Prerequisites

Before you begin, ensure you have:

Twilio Account
- Active Twilio account with a phone number
- TwiML App configured
- Account SID and Auth Token
API Keys (free tier available for all)
- Deepgram API Key (https://console.deepgram.com/)
- OpenAI API Key (https://platform.openai.com/api-keys)
- ElevenLabs API Key (https://elevenlabs.io/)
Local Network Access
- Ngrok or similar tool to expose localhost to the internet
- Ability to accept incoming webhooks from Twilio
Python 3.9+ and pip

Installation

# Clone the repository
git clone https://github.com/kesslerio/phone-agent-moltbot-skill.git
cd phone-agent-moltbot-skill

# Install dependencies
pip install -r scripts/requirements.txt

Configuration

Set Environment Variables

Create a .env file or set environment variables:

# API Keys (required)
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"
export ELEVENLABS_API_KEY="your-elevenlabs-key"

# Twilio (required)
export TWILIO_ACCOUNT_SID="your-account-sid"
export TWILIO_AUTH_TOKEN="your-auth-token"
export TWILIO_PHONE_NUMBER="+18665515246"  # Your Twilio number

# Server (optional)
export PORT=8080
export PUBLIC_URL="https://your-ngrok-url.ngrok.io"  # For webhooks

# Voice Customization (optional)
export ELEVENLABS_VOICE_ID="onwK4e9ZLuTAKqWW03F9"  # Daniel voice

Or add to ~/.moltbot/.env or ~/.clawdbot/.env:

DEEPGRAM_API_KEY=your-key
OPENAI_API_KEY=your-key
ELEVENLABS_API_KEY=your-key
TWILIO_ACCOUNT_SID=your-sid
TWILIO_AUTH_TOKEN=your-token
TWILIO_PHONE_NUMBER=+1...

Startup & Configuration

1. Start the Local Server

python3 scripts/server.py

The server will start on http://localhost:8080 by default.

2. Expose to Internet with Ngrok

In another terminal:

ngrok http 8080

Note the HTTPS URL (e.g., https://abc123.ngrok.io)

3. Configure Twilio Webhook

In Twilio Console:

Go to Phone Numbers → Your number
Under Voice & Fax:
- Set "A Call Comes In" to Webhook
- URL: https://<your-ngrok-url>.ngrok.io/incoming
- Method: POST
Save

4. Test Incoming Calls

Call your Twilio number. The agent will:

Answer and greet you
Listen to your speech
Transcribe your words
Generate a response via OpenAI
Speak the response back to you

Customization

Change Agent Persona

Edit SYSTEM_PROMPT in scripts/server.py:

SYSTEM_PROMPT = """You are a helpful customer service agent. Be friendly, concise, and professional."""

Change Voice

Set a different ElevenLabs voice ID:

export ELEVENLABS_VOICE_ID="g1r0eKKcGkk7Ep0RVcVn"  # Callum voice

Available ElevenLabs voices: https://elevenlabs.io/docs/getting-started/voices

Use Different Model

Edit scripts/server.py and change the OpenAI model:

response = await client.chat.completions.create(
    model="gpt-4",  # or "gpt-4-turbo" for faster responses
    messages=messages,
)

Task-Based Behaviors

Create YAML task definitions in the tasks/ directory:

name: book_restaurant
description: "Help the user book a restaurant reservation"
system_prompt: "You are a friendly restaurant reservation assistant..."
actions:
  - confirm_date
  - confirm_time
  - confirm_party_size
  - book_reservation

Integration with Moltbot

Add this skill to your Moltbot configuration:

{
  "skills": [
    {
      "name": "phone-agent",
      "path": "/path/to/phone-agent-moltbot-skill",
      "enabled": true
    }
  ]
}

Then reference it in workflows:

"Set up an incoming voice agent"
"Configure a customer service chatbot"
"Test voice AI capabilities"

Project Structure

phone-agent-moltbot-skill/
├── scripts/
│   ├── server.py              # Main FastAPI server
│   ├── server_realtime.py     # Realtime processing variant
│   ├── requirements.txt       # Python dependencies
│   └── typing_sound.raw       # Typing sound effect
├── tasks/
│   ├── book_restaurant.yaml   # Example task definitions
│   └── get_quote.yaml         # Example task definitions
├── calls/                     # Recording storage directory
├── references/                # Supporting documentation
├── SKILL.md                   # Moltbot skill manifest
├── README.md                  # This file
└── LICENSE                    # MIT License

Troubleshooting

Server Won't Start

Check Python version: python3 --version (requires 3.9+)
Install dependencies: pip install -r scripts/requirements.txt
Check PORT variable: echo $PORT (should be 8080 or set value)

Twilio Webhook Not Connecting

Verify ngrok is running and the URL matches your Twilio webhook
Check server logs: python3 scripts/server.py (should show incoming requests)
Test ngrok tunnel: curl https://<your-ngrok-url>.ngrok.io/health

Poor Transcription Quality

Ensure DEEPGRAM_API_KEY is valid
Check microphone/audio quality on the calling phone
Deepgram is very accurate; poor results indicate audio issues

Slow Responses

OpenAI API latency varies; gpt-4o-mini is fast and cheap
Switch to "gpt-3.5-turbo" for faster responses (less capable)
Increase timeout in websocket settings if needed

Voice Not Speaking

Verify ELEVENLABS_API_KEY is valid
Check voice ID is correct: https://elevenlabs.io/docs/api-reference/voices
Confirm audio is not muted on the receiving phone

API Reference

Incoming Call Webhook

POST /incoming

Twilio sends call information to this endpoint. The server responds with TwiML to establish WebSocket connection.

WebSocket Audio Stream

WS /ws

Bidirectional audio stream for incoming call processing.

Health Check

GET /health

Returns {"status": "ok"} if the server is running.

Performance & Scaling

Current implementation handles:

Single concurrent call per server instance
~100ms RTT for transcription + LLM + TTS
Suitable for demo/testing, hobby projects, and low-volume use

For production:

Run multiple server instances behind a load balancer
Use Twilio's call queuing
Implement connection pooling for API clients
Consider dedicated hardware for Deepgram/ElevenLabs processing

Deployment Options

Local Development

python3 scripts/server.py
ngrok http 8080

Docker

FROM python:3.11-slim
WORKDIR /app
COPY scripts/requirements.txt .
RUN pip install -r requirements.txt
COPY scripts/ .
CMD ["python3", "server.py"]

Build and run:

docker build -t phone-agent .
docker run -p 8080:8080 \
  -e DEEPGRAM_API_KEY="..." \
  -e OPENAI_API_KEY="..." \
  -e ELEVENLABS_API_KEY="..." \
  -e TWILIO_ACCOUNT_SID="..." \
  -e TWILIO_AUTH_TOKEN="..." \
  phone-agent

Cloud Deployment

Heroku: Add Procfile → web: python3 scripts/server.py
Railway.app: Auto-detects Python and builds
AWS Lambda: Use WebSocket API Gateway + Lambda
Google Cloud Run: Containerize and deploy

License

MIT

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Test thoroughly
Submit a pull request

Support

MCP Server: Deepgram | OpenAI | ElevenLabs
Twilio Docs: Voice API
Moltbot: Documentation

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

1. **Twilio Account**: Phone number + TwiML App. 2. **Deepgram API Key**: For fast speech-to-text. 3. **OpenAI API Key**: For the conversation logic. 4. **ElevenLabs API Key**: For realistic text-to-speech. 5. **Ngrok** (or similar): To expose your local port 8080 to Twilio.

FAQ

How do I install phone-agent?

Run openclaw add @kesslerio/phone-agent in your terminal. This installs phone-agent into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/kesslerio/phone-agent. Review commits and README documentation before installing.