skills$openclaw/phone-agent
kesslerio3.6k

by kesslerio

phone-agent – OpenClaw Skill

phone-agent is an OpenClaw Skills integration for ai ml workflows. Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

3.6k stars9.1k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026ai ml

Skill Snapshot

namephone-agent
descriptionRun a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot. OpenClaw Skills integration.
ownerkesslerio
repositorykesslerio/phone-agent
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @kesslerio/phone-agent
last updatedFeb 7, 2026

Maintainer

kesslerio

kesslerio

Maintains phone-agent in the OpenClaw Skills directory.

View GitHub profile
File Explorer
10 files
.
scripts
requirements.txt
111 B
server_realtime.py
8.9 KB
server.py
28.2 KB
tasks
book_restaurant.yaml
1.0 KB
get_quote.yaml
1.0 KB
_meta.json
283 B
README.md
8.6 KB
SKILL.md
2.2 KB
SKILL.md

name: phone-agent description: "Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot."

Phone Agent Skill

Runs a local FastAPI server that acts as a real-time voice bridge.

Architecture

Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
                                                  |
                                                  +--> OpenAI (LLM)
                                                  +--> ElevenLabs (TTS)

Prerequisites

  1. Twilio Account: Phone number + TwiML App.
  2. Deepgram API Key: For fast speech-to-text.
  3. OpenAI API Key: For the conversation logic.
  4. ElevenLabs API Key: For realistic text-to-speech.
  5. Ngrok (or similar): To expose your local port 8080 to Twilio.

Setup

  1. Install Dependencies:

    pip install -r scripts/requirements.txt
    
  2. Set Environment Variables (in ~/.moltbot/.env, ~/.clawdbot/.env, or export):

    export DEEPGRAM_API_KEY="your_key"
    export OPENAI_API_KEY="your_key"
    export ELEVENLABS_API_KEY="your_key"
    export TWILIO_ACCOUNT_SID="your_sid"
    export TWILIO_AUTH_TOKEN="your_token"
    export PORT=8080
    
  3. Start the Server:

    python3 scripts/server.py
    
  4. Expose to Internet:

    ngrok http 8080
    
  5. Configure Twilio:

    • Go to your Phone Number settings.
    • Set "Voice & Fax" -> "A Call Comes In" to Webhook.
    • URL: https://<your-ngrok-url>.ngrok.io/incoming
    • Method: POST

Usage

Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.

Customization

  • System Prompt: Edit SYSTEM_PROMPT in scripts/server.py to change the persona.
  • Voice: Change ELEVENLABS_VOICE_ID to use different voices.
  • Model: Switch gpt-4o-mini to gpt-4 for smarter (but slower) responses.
README.md

Phone Agent Moltbot Skill

A real-time AI voice agent that handles incoming phone calls using Twilio, transcribes speech with Deepgram, generates responses via OpenAI, and speaks back with ElevenLabs text-to-speech.

Features

  • Real-time Voice Processing: Handles incoming Twilio calls with low-latency WebSocket audio
  • Automatic Speech Recognition: Deepgram for fast, accurate transcription
  • AI-Powered Responses: OpenAI GPT for intelligent conversation
  • Natural Speech Output: ElevenLabs for realistic, streaming TTS
  • Task-Based Automation: Configurable task definitions for specific agent behaviors
  • Recording & Logging: Automatic call recording and conversation logs

Architecture

Incoming Call (Twilio Phone)
         |
         v
  Twilio WebSocket (Audio Stream)
         |
         +---> Local FastAPI Server
         |           |
         |           +---> Deepgram (Speech-to-Text)
         |           |
         |           +---> OpenAI (LLM/Intelligence)
         |           |
         |           +---> ElevenLabs (Text-to-Speech)
         |           |
         +---------- (Audio Response)
         |
    Phone Speaker Output

Prerequisites

Before you begin, ensure you have:

  1. Twilio Account

    • Active Twilio account with a phone number
    • TwiML App configured
    • Account SID and Auth Token
  2. API Keys (free tier available for all)

  3. Local Network Access

    • Ngrok or similar tool to expose localhost to the internet
    • Ability to accept incoming webhooks from Twilio
  4. Python 3.9+ and pip

Installation

# Clone the repository
git clone https://github.com/kesslerio/phone-agent-moltbot-skill.git
cd phone-agent-moltbot-skill

# Install dependencies
pip install -r scripts/requirements.txt

Configuration

Set Environment Variables

Create a .env file or set environment variables:

# API Keys (required)
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"
export ELEVENLABS_API_KEY="your-elevenlabs-key"

# Twilio (required)
export TWILIO_ACCOUNT_SID="your-account-sid"
export TWILIO_AUTH_TOKEN="your-auth-token"
export TWILIO_PHONE_NUMBER="+18665515246"  # Your Twilio number

# Server (optional)
export PORT=8080
export PUBLIC_URL="https://your-ngrok-url.ngrok.io"  # For webhooks

# Voice Customization (optional)
export ELEVENLABS_VOICE_ID="onwK4e9ZLuTAKqWW03F9"  # Daniel voice

Or add to ~/.moltbot/.env or ~/.clawdbot/.env:

DEEPGRAM_API_KEY=your-key
OPENAI_API_KEY=your-key
ELEVENLABS_API_KEY=your-key
TWILIO_ACCOUNT_SID=your-sid
TWILIO_AUTH_TOKEN=your-token
TWILIO_PHONE_NUMBER=+1...

Startup & Configuration

1. Start the Local Server

python3 scripts/server.py

The server will start on http://localhost:8080 by default.

2. Expose to Internet with Ngrok

In another terminal:

ngrok http 8080

Note the HTTPS URL (e.g., https://abc123.ngrok.io)

3. Configure Twilio Webhook

In Twilio Console:

  1. Go to Phone Numbers → Your number
  2. Under Voice & Fax:
    • Set "A Call Comes In" to Webhook
    • URL: https://<your-ngrok-url>.ngrok.io/incoming
    • Method: POST
  3. Save

4. Test Incoming Calls

Call your Twilio number. The agent will:

  1. Answer and greet you
  2. Listen to your speech
  3. Transcribe your words
  4. Generate a response via OpenAI
  5. Speak the response back to you

Customization

Change Agent Persona

Edit SYSTEM_PROMPT in scripts/server.py:

SYSTEM_PROMPT = """You are a helpful customer service agent. Be friendly, concise, and professional."""

Change Voice

Set a different ElevenLabs voice ID:

export ELEVENLABS_VOICE_ID="g1r0eKKcGkk7Ep0RVcVn"  # Callum voice

Available ElevenLabs voices: https://elevenlabs.io/docs/getting-started/voices

Use Different Model

Edit scripts/server.py and change the OpenAI model:

response = await client.chat.completions.create(
    model="gpt-4",  # or "gpt-4-turbo" for faster responses
    messages=messages,
)

Task-Based Behaviors

Create YAML task definitions in the tasks/ directory:

name: book_restaurant
description: "Help the user book a restaurant reservation"
system_prompt: "You are a friendly restaurant reservation assistant..."
actions:
  - confirm_date
  - confirm_time
  - confirm_party_size
  - book_reservation

Integration with Moltbot

Add this skill to your Moltbot configuration:

{
  "skills": [
    {
      "name": "phone-agent",
      "path": "/path/to/phone-agent-moltbot-skill",
      "enabled": true
    }
  ]
}

Then reference it in workflows:

  • "Set up an incoming voice agent"
  • "Configure a customer service chatbot"
  • "Test voice AI capabilities"

Project Structure

phone-agent-moltbot-skill/
├── scripts/
│   ├── server.py              # Main FastAPI server
│   ├── server_realtime.py     # Realtime processing variant
│   ├── requirements.txt       # Python dependencies
│   └── typing_sound.raw       # Typing sound effect
├── tasks/
│   ├── book_restaurant.yaml   # Example task definitions
│   └── get_quote.yaml         # Example task definitions
├── calls/                     # Recording storage directory
├── references/                # Supporting documentation
├── SKILL.md                   # Moltbot skill manifest
├── README.md                  # This file
└── LICENSE                    # MIT License

Troubleshooting

Server Won't Start

  • Check Python version: python3 --version (requires 3.9+)
  • Install dependencies: pip install -r scripts/requirements.txt
  • Check PORT variable: echo $PORT (should be 8080 or set value)

Twilio Webhook Not Connecting

  • Verify ngrok is running and the URL matches your Twilio webhook
  • Check server logs: python3 scripts/server.py (should show incoming requests)
  • Test ngrok tunnel: curl https://<your-ngrok-url>.ngrok.io/health

Poor Transcription Quality

  • Ensure DEEPGRAM_API_KEY is valid
  • Check microphone/audio quality on the calling phone
  • Deepgram is very accurate; poor results indicate audio issues

Slow Responses

  • OpenAI API latency varies; gpt-4o-mini is fast and cheap
  • Switch to "gpt-3.5-turbo" for faster responses (less capable)
  • Increase timeout in websocket settings if needed

Voice Not Speaking

API Reference

Incoming Call Webhook

POST /incoming

Twilio sends call information to this endpoint. The server responds with TwiML to establish WebSocket connection.

WebSocket Audio Stream

WS /ws

Bidirectional audio stream for incoming call processing.

Health Check

GET /health

Returns {"status": "ok"} if the server is running.

Performance & Scaling

Current implementation handles:

  • Single concurrent call per server instance
  • ~100ms RTT for transcription + LLM + TTS
  • Suitable for demo/testing, hobby projects, and low-volume use

For production:

  • Run multiple server instances behind a load balancer
  • Use Twilio's call queuing
  • Implement connection pooling for API clients
  • Consider dedicated hardware for Deepgram/ElevenLabs processing

Deployment Options

Local Development

python3 scripts/server.py
ngrok http 8080

Docker

FROM python:3.11-slim
WORKDIR /app
COPY scripts/requirements.txt .
RUN pip install -r requirements.txt
COPY scripts/ .
CMD ["python3", "server.py"]

Build and run:

docker build -t phone-agent .
docker run -p 8080:8080 \
  -e DEEPGRAM_API_KEY="..." \
  -e OPENAI_API_KEY="..." \
  -e ELEVENLABS_API_KEY="..." \
  -e TWILIO_ACCOUNT_SID="..." \
  -e TWILIO_AUTH_TOKEN="..." \
  phone-agent

Cloud Deployment

  • Heroku: Add Procfileweb: python3 scripts/server.py
  • Railway.app: Auto-detects Python and builds
  • AWS Lambda: Use WebSocket API Gateway + Lambda
  • Google Cloud Run: Containerize and deploy

License

MIT

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Test thoroughly
  4. Submit a pull request

Support

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

1. **Twilio Account**: Phone number + TwiML App. 2. **Deepgram API Key**: For fast speech-to-text. 3. **OpenAI API Key**: For the conversation logic. 4. **ElevenLabs API Key**: For realistic text-to-speech. 5. **Ngrok** (or similar): To expose your local port 8080 to Twilio.

FAQ

How do I install phone-agent?

Run openclaw add @kesslerio/phone-agent in your terminal. This installs phone-agent into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/kesslerio/phone-agent. Review commits and README documentation before installing.