skills$openclaw/audio-cog
nitishgargiitd5.3k

by nitishgargiitd

audio-cog – OpenClaw Skill

audio-cog is an OpenClaw Skills integration for coding workflows. AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.

5.3k stars5.4k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

nameaudio-cog
descriptionAI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. OpenClaw Skills integration.
ownernitishgargiitd
repositorynitishgargiitd/audio-cog
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @nitishgargiitd/audio-cog
last updatedFeb 7, 2026

Maintainer

nitishgargiitd

nitishgargiitd

Maintains audio-cog in the OpenClaw Skills directory.

View GitHub profile
File Explorer
2 files
.
_meta.json
456 B
SKILL.md
6.0 KB
SKILL.md

name: audio-cog description: AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI. metadata: openclaw: emoji: "🎵"

Audio Cog - AI Audio Generation Powered by CellCog

Create professional audio with AI - from voiceovers and narration to background music and sound design.


Prerequisites

This skill requires the CellCog mothership skill for SDK setup and API calls.

clawhub install cellcog

Read the cellcog skill first for SDK setup. This skill shows you what's possible.

Quick pattern (v1.0+):

# Fire-and-forget - returns immediately
result = client.create_chat(
    prompt="[your audio request]",
    notify_session_key="agent:main:main",
    task_label="audio-task",
    chat_mode="agent"  # Agent mode is optimal for all audio tasks
)
# Daemon notifies you when complete - do NOT poll

What Audio You Can Create

Text-to-Speech / Voiceover

Convert text to natural-sounding speech:

  • Narration: "Generate a professional male voiceover for this product video script"
  • Audiobook Style: "Create an engaging narration of this short story with emotional delivery"
  • Podcast Intros: "Generate a warm, friendly podcast intro: 'Welcome to The Daily Tech...'"
  • E-Learning: "Create clear, instructional voiceover for this training module"
  • IVR/Phone Systems: "Generate professional phone menu prompts"

Voice Customization

Control how the voice sounds:

  • Gender: Male, female, neutral
  • Age: Young adult, middle-aged, senior
  • Emotion: Neutral, happy, sad, excited, serious, friendly, authoritative
  • Accent: American, British, Australian, Indian, and many more
  • Pacing: Slow and deliberate, conversational, fast and energetic
  • Tone: Professional, casual, warm, mysterious, dramatic

Music Generation

Create original background music and soundtracks:

  • Background Music: "Create calm lo-fi background music for a study video, 2 minutes"
  • Podcast Music: "Generate an upbeat intro jingle for a tech podcast, 15 seconds"
  • Video Soundtracks: "Create cinematic orchestral music for a product launch video"
  • Ambient/Mood: "Generate peaceful ambient sounds for a meditation app"
  • Genre-Specific: "Create energetic electronic music for a fitness video"

Music Specifications

ParameterOptions
Duration15 seconds to 5+ minutes
GenreElectronic, rock, classical, jazz, ambient, lo-fi, cinematic, pop, hip-hop
Tempo60 BPM (slow) to 180+ BPM (fast)
MoodUpbeat, calm, dramatic, mysterious, inspiring, melancholic
InstrumentsPiano, guitar, synth, strings, drums, brass, etc.

Audio Output Formats

FormatBest For
MP3Standard audio delivery, voiceovers, music
Combined with videoBackground music for video-cog outputs

Chat Mode for Audio

Use chat_mode="agent" for all audio generation tasks.

Audio generation—whether voiceovers, music, or sound design—executes efficiently in agent mode. CellCog's audio capabilities don't require multi-angle deliberation; they require precise execution, which agent mode excels at.

There's no scenario where agent team mode provides meaningfully better audio output. Save agent team for research and complex creative work that benefits from multiple reasoning passes.


Example Audio Prompts

Professional voiceover:

"Generate a professional voiceover for this script:

'Introducing TaskFlow - the project management tool that actually works. With intelligent automation, seamless collaboration, and powerful analytics, TaskFlow helps teams do their best work.'

Voice: Female, American accent, confident and friendly, medium pace. Suitable for a product launch video."

Podcast intro:

"Create a podcast intro voiceover:

'Welcome to Future Forward, the podcast where we explore the technologies shaping tomorrow. I'm your host, and today we're diving into...'

Voice: Male, warm and engaging, conversational tone. Also generate a 10-second upbeat intro music bed to go underneath."

Background music:

"Generate 2 minutes of calm, lo-fi hip-hop style background music. Should be chill and unobtrusive, good for studying or working. Include soft piano, mellow beats, and gentle vinyl crackle. 75 BPM."

Audiobook narration:

"Create an audiobook-style narration of this passage:

[passage text]

Voice: Male, British accent, 50s, warm and storytelling quality. Pace should be measured, with appropriate pauses for drama. Think classic BBC narrator."

Cinematic music:

"Generate 90 seconds of cinematic orchestral music for a tech company's 'About Us' video. Start soft and inspiring, build to a confident crescendo, then resolve to a hopeful ending. Think Hans Zimmer meets corporate inspiration."


Multi-Language Support

CellCog can generate speech in many languages:

  • English (multiple accents)
  • Spanish, French, German, Italian, Portuguese
  • Chinese (Mandarin, Cantonese)
  • Japanese, Korean
  • Hindi, Arabic
  • And many more

Specify the language in your prompt:

"Generate this text in Japanese with a native female speaker: 'いらっしゃいませ...'"


Tips for Better Audio

  1. For voiceovers: Provide the complete script. Don't say "something about our product" - write out exactly what should be said.

  2. Describe the voice: Age, gender, accent, emotion, pace. "Professional female, American, confident but warm, medium pace" is clear.

  3. For music: Specify duration, tempo (BPM if you know it), mood, and genre. Reference artists or songs if helpful: "Think lo-fi beats like ChilledCow playlists."

  4. Pronunciation guidance: For names or technical terms, add pronunciation hints: "CellCog (pronounced SELL-kog)"

  5. Emotional beats: For longer voiceovers, indicate tone shifts: "[excited] And now for the big reveal... [serious] But there's a catch."

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

This skill requires the CellCog mothership skill for SDK setup and API calls. ```bash clawhub install cellcog ``` **Read the cellcog skill first** for SDK setup. This skill shows you what's possible. **Quick pattern (v1.0+):** ```python

FAQ

How do I install audio-cog?

Run openclaw add @nitishgargiitd/audio-cog in your terminal. This installs audio-cog into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/nitishgargiitd/audio-cog. Review commits and README documentation before installing.