skills$openclaw/Vision Sandbox

1.4k★

Vision Sandbox – OpenClaw Skill

Name: Vision Sandbox
Author: johanesalxd

Vision Sandbox is an OpenClaw Skills integration for coding workflows. Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.

1.4k stars494 forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	Vision Sandbox
description	Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. OpenClaw Skills integration.
owner	johanesalxd
repository	johanesalxd/vision-sandbox
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @johanesalxd/vision-sandbox
last updated	Feb 7, 2026

Maintainer

johanesalxd

Maintains Vision Sandbox in the OpenClaw Skills directory.

View GitHub profile

File Explorer

10 files

scripts

__init__.py

vision_executor.py

2.9 KB

_meta.json

463 B

AGENTS.md

3.4 KB

CONTRIBUTING.md

802 B

main.py

80 B

pyproject.toml

677 B

README.md

2.5 KB

SKILL.md

2.0 KB

SKILL.md

name: Vision Sandbox slug: vision-sandbox version: 1.1.0 description: Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. metadata: openclaw: emoji: "🔭" primaryEnv: "GEMINI_API_KEY" requires: bins: ["uv"] env: ["GEMINI_API_KEY"]

Vision Sandbox 🔭

Leverage Gemini's native code execution to analyze images with high precision. The model writes and runs Python code in a Google-hosted sandbox to verify visual data, perfect for UI auditing, spatial grounding, and visual reasoning.

Installation

clawhub install vision-sandbox

Usage

uv run vision-sandbox --image "path/to/image.png" --prompt "Identify all buttons and provide [x, y] coordinates."

Pattern Library

📍 Spatial Grounding

Ask the model to find specific items and return coordinates.

Prompt: "Locate the 'Submit' button in this screenshot. Use code execution to verify its center point and return the [x, y] coordinates in a [0, 1000] scale."

🧮 Visual Math

Ask the model to count or calculate based on the image.

Prompt: "Count the number of items in the list. Use Python to sum their values if prices are visible."

🖥️ UI Audit

Check layout and readability.

Prompt: "Check if the header text overlaps with any icons. Use the sandbox to calculate the bounding box intersections."

🖐️ Counting & Logic

Solve visual counting tasks with code verification.

Prompt: "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."

Integration with OpenCode

This skill is designed to provide Visual Grounding for automated coding agents like OpenCode.

Step 1: Use vision-sandbox to extract UI metadata (coordinates, sizes, colors).
Step 2: Pass the JSON output to OpenCode to generate or fix CSS/HTML.

Configuration

GEMINI_API_KEY: Required environment variable.
Model: Defaults to gemini-3-flash-preview.

README.md

Vision Sandbox 🔭

Agentic Vision via Gemini's native Python code execution sandbox.

Instead of just "guessing" what's in an image, the model can write and execute code to verify spatial relationships, count objects, or perform complex visual reasoning with pixel-level precision.

🚀 Primary Use Cases

Designed as a core skill for OpenClaw, Vision Sandbox provides visual grounding for agentic workflows:

Spatial Grounding: Get precise [x, y] coordinates for UI elements.
Visual Calculation: Let the model use Python to calculate values from visual data.
UI Auditing: Automatically check for overlaps, alignment, and accessibility.

🛠 Prerequisites

uv (Python package manager)
Python 3.11 (Locked for stability)
GEMINI_API_KEY set in your environment.

📦 Installation

Via ClawHub (Recommended)

clawhub install vision-sandbox

For Local Development

git clone https://github.com/johanesalxd/vision-sandbox.git
cd vision-sandbox
uv sync

📖 Quick Start

Run a vision task using the CLI:

uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the fingers."

Example: Visual Reasoning

uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."

Result: The model writes Python code to define bounding boxes for each digit, ensuring an accurate count rather than a visual guess.

Verification Output

🤖 OpenCode Integration

Vision Sandbox is a powerful companion for OpenCode.

Installation for OpenCode

Global Installation: Copy SKILL.md to your global OpenCode skills directory:

mkdir -p ~/.config/opencode/skills/vision-sandbox
cp SKILL.md ~/.config/opencode/skills/vision-sandbox/SKILL.md

Project Installation: If you want the skill available only for a specific project:

mkdir -p .opencode/skills/vision-sandbox
cp SKILL.md .opencode/skills/vision-sandbox/SKILL.md

Example Interaction

"Hey OpenCode, run the vision-sandbox skill on this screenshot to find the exact padding of the login card, then update styles.css accordingly."

🧑‍💻 Development

Linting & Formatting

This project uses ruff for code quality.

uv run ruff format .
uv run ruff check --fix .

Running Tests

uv run pytest

📜 License

MIT

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

OpenClaw CLI installed and configured.
Language: Markdown
License: MIT
Topics:

Configuration

- **GEMINI_API_KEY**: Required environment variable. - **Model**: Defaults to `gemini-3-flash-preview`.

FAQ

How do I install Vision Sandbox?

Run openclaw add @johanesalxd/vision-sandbox in your terminal. This installs Vision Sandbox into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/johanesalxd/vision-sandbox. Review commits and README documentation before installing.