skills$openclaw/video-ad-analyzer
fortytwode5.6k

by fortytwode

video-ad-analyzer – OpenClaw Skill

video-ad-analyzer is an OpenClaw Skills integration for coding workflows. Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

5.6k stars1.6k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

namevideo-ad-analyzer
descriptionExtract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions. OpenClaw Skills integration.
ownerfortytwode
repositoryfortytwode/meta-video-ad-analyzer
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @fortytwode/meta-video-ad-analyzer
last updatedFeb 7, 2026

Maintainer

fortytwode

fortytwode

Maintains video-ad-analyzer in the OpenClaw Skills directory.

View GitHub profile
File Explorer
9 files
.
prompts
scene_analysis.md
3.3 KB
scene_reconciliation.md
3.6 KB
scripts
models.py
548 B
prompt_manager.py
4.9 KB
video_extractor.py
28.0 KB
_meta.json
300 B
SKILL.md
3.6 KB
SKILL.md

name: video-ad-analyzer version: 1.0.0 description: Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

Video Ad Analyzer

AI-powered video content extraction using Google Gemini Vision.

What This Skill Does

  • Frame Extraction: Smart sampling with scene change detection
  • OCR Text Detection: Extract text overlays using EasyOCR
  • Audio Transcription: Convert speech to text with Google Cloud Speech
  • AI Scene Analysis: Describe each scene using Gemini Vision
  • Native Video Analysis: Direct video understanding for longer content
  • Thumbnail Generation: Auto-generate thumbnails from first frame

Setup

1. Environment Variables

# Required for Gemini Vision
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Required for audio transcription
# (same service account needs Speech-to-Text API enabled)

2. Dependencies

pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client

Also requires ffmpeg and ffprobe installed on system.

Usage

Basic Video Analysis

from scripts.video_extractor import VideoExtractor
from scripts.models import ExtractedVideoContent
import vertexai
from vertexai.generative_models import GenerativeModel

# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
gemini_model = GenerativeModel("gemini-1.5-flash")

# Create extractor
extractor = VideoExtractor(gemini_model=gemini_model)

# Analyze video
result = extractor.extract_content("/path/to/video.mp4")

print(f"Duration: {result.duration}s")
print(f"Scenes: {len(result.scene_timeline)}")
print(f"Text overlays: {len(result.text_timeline)}")
print(f"Transcript: {result.transcript[:200]}...")

Extract Only Frames

frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
    "/path/to/video.mp4",
    scene_interval=2,    # Check for scene changes every 2s
    text_interval=0.5    # Check for text every 0.5s
)

Analyze Images

# Works with images too
result = extractor.extract_content("/path/to/image.jpg")
print(result.scene_timeline[0]['description'])

Output Structure

ExtractedVideoContent(
    video_path="/path/to/video.mp4",
    duration=30.5,
    transcript="Here's what we found...",
    text_timeline=[
        {"at": 0.0, "text": ["Download Now"]},
        {"at": 5.5, "text": ["50% Off Today"]}
    ],
    scene_timeline=[
        {"timestamp": 0.0, "description": "Woman using phone app..."},
        {"timestamp": 2.0, "description": "Product showcase with features..."}
    ],
    thumbnail_url="/static/thumbnails/video_thumb.jpg",
    extraction_complete=True
)

Key Features

FeatureDescription
Scene DetectionHistogram-based change detection (threshold=65)
OCR ConfidenceTiered thresholds (0.5 high, 0.3 low)
AI ProofreadingGemini cleans up OCR errors
Source ReconciliationMerges OCR + Vision text intelligently
Native VideoDirect Gemini analysis for <20MB files

Prompts

Customize AI behavior by editing prompts in the prompts/ folder:

  • scene_analysis.md - Frame analysis prompts
  • scene_reconciliation.md - Scene enrichment prompts

Common Questions This Answers

  • "What text appears in this video ad?"
  • "Describe each scene in this creative"
  • "What does the narrator say?"
  • "Extract the call-to-action from this ad"
README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

  • OpenClaw CLI installed and configured.
  • Language: Markdown
  • License: MIT
  • Topics:

FAQ

How do I install video-ad-analyzer?

Run openclaw add @fortytwode/meta-video-ad-analyzer in your terminal. This installs video-ad-analyzer into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/fortytwode/meta-video-ad-analyzer. Review commits and README documentation before installing.