1.4k★by johanesalxd
Vision Sandbox – OpenClaw Skill
Vision Sandbox is an OpenClaw Skills integration for coding workflows. Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
Skill Snapshot
| name | Vision Sandbox |
| description | Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. OpenClaw Skills integration. |
| owner | johanesalxd |
| repository | johanesalxd/vision-sandbox |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @johanesalxd/vision-sandbox |
| last updated | Feb 7, 2026 |
Maintainer

name: Vision Sandbox slug: vision-sandbox version: 1.1.0 description: Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. metadata: openclaw: emoji: "🔭" primaryEnv: "GEMINI_API_KEY" requires: bins: ["uv"] env: ["GEMINI_API_KEY"]
Vision Sandbox 🔭
Leverage Gemini's native code execution to analyze images with high precision. The model writes and runs Python code in a Google-hosted sandbox to verify visual data, perfect for UI auditing, spatial grounding, and visual reasoning.
Installation
clawhub install vision-sandbox
Usage
uv run vision-sandbox --image "path/to/image.png" --prompt "Identify all buttons and provide [x, y] coordinates."
Pattern Library
📍 Spatial Grounding
Ask the model to find specific items and return coordinates.
- Prompt: "Locate the 'Submit' button in this screenshot. Use code execution to verify its center point and return the [x, y] coordinates in a [0, 1000] scale."
🧮 Visual Math
Ask the model to count or calculate based on the image.
- Prompt: "Count the number of items in the list. Use Python to sum their values if prices are visible."
🖥️ UI Audit
Check layout and readability.
- Prompt: "Check if the header text overlaps with any icons. Use the sandbox to calculate the bounding box intersections."
🖐️ Counting & Logic
Solve visual counting tasks with code verification.
- Prompt: "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."
Integration with OpenCode
This skill is designed to provide Visual Grounding for automated coding agents like OpenCode.
- Step 1: Use
vision-sandboxto extract UI metadata (coordinates, sizes, colors). - Step 2: Pass the JSON output to OpenCode to generate or fix CSS/HTML.
Configuration
- GEMINI_API_KEY: Required environment variable.
- Model: Defaults to
gemini-3-flash-preview.
Vision Sandbox 🔭
Agentic Vision via Gemini's native Python code execution sandbox.
Instead of just "guessing" what's in an image, the model can write and execute code to verify spatial relationships, count objects, or perform complex visual reasoning with pixel-level precision.
🚀 Primary Use Cases
Designed as a core skill for OpenClaw, Vision Sandbox provides visual grounding for agentic workflows:
- Spatial Grounding: Get precise [x, y] coordinates for UI elements.
- Visual Calculation: Let the model use Python to calculate values from visual data.
- UI Auditing: Automatically check for overlaps, alignment, and accessibility.
🛠 Prerequisites
- uv (Python package manager)
- Python 3.11 (Locked for stability)
GEMINI_API_KEYset in your environment.
📦 Installation
Via ClawHub (Recommended)
clawhub install vision-sandbox
For Local Development
git clone https://github.com/johanesalxd/vision-sandbox.git
cd vision-sandbox
uv sync
📖 Quick Start
Run a vision task using the CLI:
uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the fingers."
Example: Visual Reasoning
uv run vision-sandbox --image "sample/how-many-fingers.png" --prompt "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."
Result: The model writes Python code to define bounding boxes for each digit, ensuring an accurate count rather than a visual guess.

🤖 OpenCode Integration
Vision Sandbox is a powerful companion for OpenCode.
Installation for OpenCode
-
Global Installation: Copy
SKILL.mdto your global OpenCode skills directory:mkdir -p ~/.config/opencode/skills/vision-sandbox cp SKILL.md ~/.config/opencode/skills/vision-sandbox/SKILL.md -
Project Installation: If you want the skill available only for a specific project:
mkdir -p .opencode/skills/vision-sandbox cp SKILL.md .opencode/skills/vision-sandbox/SKILL.md
Example Interaction
"Hey OpenCode, run the
vision-sandboxskill on this screenshot to find the exact padding of the login card, then updatestyles.cssaccordingly."
🧑💻 Development
Linting & Formatting
This project uses ruff for code quality.
uv run ruff format .
uv run ruff check --fix .
Running Tests
uv run pytest
📜 License
MIT
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
- OpenClaw CLI installed and configured.
- Language: Markdown
- License: MIT
- Topics:
Configuration
- **GEMINI_API_KEY**: Required environment variable. - **Model**: Defaults to `gemini-3-flash-preview`.
FAQ
How do I install Vision Sandbox?
Run openclaw add @johanesalxd/vision-sandbox in your terminal. This installs Vision Sandbox into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/johanesalxd/vision-sandbox. Review commits and README documentation before installing.
