5.3k★audio-cog – OpenClaw Skill
audio-cog is an OpenClaw Skills integration for coding workflows. AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.
Skill Snapshot
| name | audio-cog |
| description | AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. OpenClaw Skills integration. |
| owner | nitishgargiitd |
| repository | nitishgargiitd/audio-cog |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @nitishgargiitd/audio-cog |
| last updated | Feb 7, 2026 |
Maintainer

name: audio-cog description: AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. metadata: openclaw: emoji: "🎵"
Audio Cog - AI Audio Generation Powered by CellCog
Create professional audio with AI - from voiceovers and narration to background music and sound design.
Prerequisites
This skill requires the CellCog mothership skill for SDK setup and API calls.
clawhub install cellcog
Read the cellcog skill first for SDK setup. This skill shows you what's possible.
Quick pattern (v1.0+):
# Fire-and-forget - returns immediately
result = client.create_chat(
prompt="[your audio request]",
notify_session_key="agent:main:main",
task_label="audio-task",
chat_mode="agent" # Agent mode is optimal for all audio tasks
)
# Daemon notifies you when complete - do NOT poll
What Audio You Can Create
Text-to-Speech / Voiceover
Convert text to natural-sounding speech:
- Narration: "Generate a professional male voiceover for this product video script"
- Audiobook Style: "Create an engaging narration of this short story with emotional delivery"
- Podcast Intros: "Generate a warm, friendly podcast intro: 'Welcome to The Daily Tech...'"
- E-Learning: "Create clear, instructional voiceover for this training module"
- IVR/Phone Systems: "Generate professional phone menu prompts"
Voice Customization
Control how the voice sounds:
- Gender: Male, female, neutral
- Age: Young adult, middle-aged, senior
- Emotion: Neutral, happy, sad, excited, serious, friendly, authoritative
- Accent: American, British, Australian, Indian, and many more
- Pacing: Slow and deliberate, conversational, fast and energetic
- Tone: Professional, casual, warm, mysterious, dramatic
Music Generation
Create original background music and soundtracks:
- Background Music: "Create calm lo-fi background music for a study video, 2 minutes"
- Podcast Music: "Generate an upbeat intro jingle for a tech podcast, 15 seconds"
- Video Soundtracks: "Create cinematic orchestral music for a product launch video"
- Ambient/Mood: "Generate peaceful ambient sounds for a meditation app"
- Genre-Specific: "Create energetic electronic music for a fitness video"
Music Specifications
| Parameter | Options |
|---|---|
| Duration | 15 seconds to 5+ minutes |
| Genre | Electronic, rock, classical, jazz, ambient, lo-fi, cinematic, pop, hip-hop |
| Tempo | 60 BPM (slow) to 180+ BPM (fast) |
| Mood | Upbeat, calm, dramatic, mysterious, inspiring, melancholic |
| Instruments | Piano, guitar, synth, strings, drums, brass, etc. |
Audio Output Formats
| Format | Best For |
|---|---|
| MP3 | Standard audio delivery, voiceovers, music |
| Combined with video | Background music for video-cog outputs |
Chat Mode for Audio
Use chat_mode="agent" for all audio generation tasks.
Audio generation—whether voiceovers, music, or sound design—executes efficiently in agent mode. CellCog's audio capabilities don't require multi-angle deliberation; they require precise execution, which agent mode excels at.
There's no scenario where agent team mode provides meaningfully better audio output. Save agent team for research and complex creative work that benefits from multiple reasoning passes.
Example Audio Prompts
Professional voiceover:
"Generate a professional voiceover for this script:
'Introducing TaskFlow - the project management tool that actually works. With intelligent automation, seamless collaboration, and powerful analytics, TaskFlow helps teams do their best work.'
Voice: Female, American accent, confident and friendly, medium pace. Suitable for a product launch video."
Podcast intro:
"Create a podcast intro voiceover:
'Welcome to Future Forward, the podcast where we explore the technologies shaping tomorrow. I'm your host, and today we're diving into...'
Voice: Male, warm and engaging, conversational tone. Also generate a 10-second upbeat intro music bed to go underneath."
Background music:
"Generate 2 minutes of calm, lo-fi hip-hop style background music. Should be chill and unobtrusive, good for studying or working. Include soft piano, mellow beats, and gentle vinyl crackle. 75 BPM."
Audiobook narration:
"Create an audiobook-style narration of this passage:
[passage text]
Voice: Male, British accent, 50s, warm and storytelling quality. Pace should be measured, with appropriate pauses for drama. Think classic BBC narrator."
Cinematic music:
"Generate 90 seconds of cinematic orchestral music for a tech company's 'About Us' video. Start soft and inspiring, build to a confident crescendo, then resolve to a hopeful ending. Think Hans Zimmer meets corporate inspiration."
Multi-Language Support
CellCog can generate speech in many languages:
- English (multiple accents)
- Spanish, French, German, Italian, Portuguese
- Chinese (Mandarin, Cantonese)
- Japanese, Korean
- Hindi, Arabic
- And many more
Specify the language in your prompt:
"Generate this text in Japanese with a native female speaker: 'いらっしゃいませ...'"
Tips for Better Audio
-
For voiceovers: Provide the complete script. Don't say "something about our product" - write out exactly what should be said.
-
Describe the voice: Age, gender, accent, emotion, pace. "Professional female, American, confident but warm, medium pace" is clear.
-
For music: Specify duration, tempo (BPM if you know it), mood, and genre. Reference artists or songs if helpful: "Think lo-fi beats like ChilledCow playlists."
-
Pronunciation guidance: For names or technical terms, add pronunciation hints: "CellCog (pronounced SELL-kog)"
-
Emotional beats: For longer voiceovers, indicate tone shifts: "[excited] And now for the big reveal... [serious] But there's a catch."
No README available.
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
This skill requires the CellCog mothership skill for SDK setup and API calls. ```bash clawhub install cellcog ``` **Read the cellcog skill first** for SDK setup. This skill shows you what's possible. **Quick pattern (v1.0+):** ```python
FAQ
How do I install audio-cog?
Run openclaw add @nitishgargiitd/audio-cog in your terminal. This installs audio-cog into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/nitishgargiitd/audio-cog. Review commits and README documentation before installing.
