skills$openclaw/audio-cog

5.3k★

audio-cog – OpenClaw Skill

Name: audio-cog
Author: nitishgargiitd

audio-cog is an OpenClaw Skills integration for coding workflows. AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.

5.3k stars5.4k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	audio-cog
description	AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. OpenClaw Skills integration.
owner	nitishgargiitd
repository	nitishgargiitd/audio-cog
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @nitishgargiitd/audio-cog
last updated	Feb 7, 2026

Maintainer

nitishgargiitd

Maintains audio-cog in the OpenClaw Skills directory.

View GitHub profile

File Explorer

2 files

_meta.json

456 B

SKILL.md

6.0 KB

SKILL.md

name: audio-cog description: AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. metadata: openclaw: emoji: "🎵"

Audio Cog - AI Audio Generation Powered by CellCog

Create professional audio with AI - from voiceovers and narration to background music and sound design.

Prerequisites

This skill requires the CellCog mothership skill for SDK setup and API calls.

clawhub install cellcog

Read the cellcog skill first for SDK setup. This skill shows you what's possible.

Quick pattern (v1.0+):

# Fire-and-forget - returns immediately
result = client.create_chat(
    prompt="[your audio request]",
    notify_session_key="agent:main:main",
    task_label="audio-task",
    chat_mode="agent"  # Agent mode is optimal for all audio tasks
)
# Daemon notifies you when complete - do NOT poll

What Audio You Can Create

Text-to-Speech / Voiceover

Convert text to natural-sounding speech:

Narration: "Generate a professional male voiceover for this product video script"
Audiobook Style: "Create an engaging narration of this short story with emotional delivery"
Podcast Intros: "Generate a warm, friendly podcast intro: 'Welcome to The Daily Tech...'"
E-Learning: "Create clear, instructional voiceover for this training module"
IVR/Phone Systems: "Generate professional phone menu prompts"

Voice Customization

Control how the voice sounds:

Gender: Male, female, neutral
Age: Young adult, middle-aged, senior
Emotion: Neutral, happy, sad, excited, serious, friendly, authoritative
Accent: American, British, Australian, Indian, and many more
Pacing: Slow and deliberate, conversational, fast and energetic
Tone: Professional, casual, warm, mysterious, dramatic

Music Generation

Create original background music and soundtracks:

Background Music: "Create calm lo-fi background music for a study video, 2 minutes"
Podcast Music: "Generate an upbeat intro jingle for a tech podcast, 15 seconds"
Video Soundtracks: "Create cinematic orchestral music for a product launch video"
Ambient/Mood: "Generate peaceful ambient sounds for a meditation app"
Genre-Specific: "Create energetic electronic music for a fitness video"

Music Specifications

Parameter	Options
Duration	15 seconds to 5+ minutes
Genre	Electronic, rock, classical, jazz, ambient, lo-fi, cinematic, pop, hip-hop
Tempo	60 BPM (slow) to 180+ BPM (fast)
Mood	Upbeat, calm, dramatic, mysterious, inspiring, melancholic
Instruments	Piano, guitar, synth, strings, drums, brass, etc.

Audio Output Formats

Format	Best For
MP3	Standard audio delivery, voiceovers, music
Combined with video	Background music for video-cog outputs

Chat Mode for Audio

Use chat_mode="agent" for all audio generation tasks.

Audio generation—whether voiceovers, music, or sound design—executes efficiently in agent mode. CellCog's audio capabilities don't require multi-angle deliberation; they require precise execution, which agent mode excels at.

There's no scenario where agent team mode provides meaningfully better audio output. Save agent team for research and complex creative work that benefits from multiple reasoning passes.

Example Audio Prompts

Professional voiceover:

"Generate a professional voiceover for this script:

'Introducing TaskFlow - the project management tool that actually works. With intelligent automation, seamless collaboration, and powerful analytics, TaskFlow helps teams do their best work.'

Voice: Female, American accent, confident and friendly, medium pace. Suitable for a product launch video."

Podcast intro:

"Create a podcast intro voiceover:

'Welcome to Future Forward, the podcast where we explore the technologies shaping tomorrow. I'm your host, and today we're diving into...'

Voice: Male, warm and engaging, conversational tone. Also generate a 10-second upbeat intro music bed to go underneath."

Background music:

"Generate 2 minutes of calm, lo-fi hip-hop style background music. Should be chill and unobtrusive, good for studying or working. Include soft piano, mellow beats, and gentle vinyl crackle. 75 BPM."

Audiobook narration:

"Create an audiobook-style narration of this passage:

[passage text]

Voice: Male, British accent, 50s, warm and storytelling quality. Pace should be measured, with appropriate pauses for drama. Think classic BBC narrator."

Cinematic music:

"Generate 90 seconds of cinematic orchestral music for a tech company's 'About Us' video. Start soft and inspiring, build to a confident crescendo, then resolve to a hopeful ending. Think Hans Zimmer meets corporate inspiration."

Multi-Language Support

CellCog can generate speech in many languages:

English (multiple accents)
Spanish, French, German, Italian, Portuguese
Chinese (Mandarin, Cantonese)
Japanese, Korean
Hindi, Arabic
And many more

Specify the language in your prompt:

"Generate this text in Japanese with a native female speaker: 'いらっしゃいませ...'"

Tips for Better Audio

For voiceovers: Provide the complete script. Don't say "something about our product" - write out exactly what should be said.
Describe the voice: Age, gender, accent, emotion, pace. "Professional female, American, confident but warm, medium pace" is clear.
For music: Specify duration, tempo (BPM if you know it), mood, and genre. Reference artists or songs if helpful: "Think lo-fi beats like ChilledCow playlists."
Pronunciation guidance: For names or technical terms, add pronunciation hints: "CellCog (pronounced SELL-kog)"
Emotional beats: For longer voiceovers, indicate tone shifts: "[excited] And now for the big reveal... [serious] But there's a catch."

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

This skill requires the CellCog mothership skill for SDK setup and API calls. ```bash clawhub install cellcog ``` **Read the cellcog skill first** for SDK setup. This skill shows you what's possible. **Quick pattern (v1.0+):** ```python

FAQ

How do I install audio-cog?

Run openclaw add @nitishgargiitd/audio-cog in your terminal. This installs audio-cog into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/nitishgargiitd/audio-cog. Review commits and README documentation before installing.