5.4k★by spliff7777
windows-control – OpenClaw Skill
windows-control is an OpenClaw Skills integration for coding workflows. Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Skill Snapshot
| name | windows-control |
| description | Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human. OpenClaw Skills integration. |
| owner | spliff7777 |
| repository | spliff7777/windows-control |
| language | Markdown |
| license | MIT |
| topics | |
| security | L1 |
| install | openclaw add @spliff7777/windows-control |
| last updated | Feb 7, 2026 |
Maintainer

name: windows-control description: Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Windows Control Skill
Full desktop automation for Windows. Control mouse, keyboard, and screen like a human user.
Quick Start
All scripts are in skills/windows-control/scripts/
Screenshot
py screenshot.py > output.b64
Returns base64 PNG of entire screen.
Click
py click.py 500 300 # Left click at (500, 300)
py click.py 500 300 right # Right click
py click.py 500 300 left 2 # Double click
Type Text
py type_text.py "Hello World"
Types text at current cursor position (10ms between keys).
Press Keys
py key_press.py "enter"
py key_press.py "ctrl+s"
py key_press.py "alt+tab"
py key_press.py "ctrl+shift+esc"
Move Mouse
py mouse_move.py 500 300
Moves mouse to coordinates (smooth 0.2s animation).
Scroll
py scroll.py up 5 # Scroll up 5 notches
py scroll.py down 10 # Scroll down 10 notches
Window Management (NEW!)
py focus_window.py "Chrome" # Bring window to front
py minimize_window.py "Notepad" # Minimize window
py maximize_window.py "VS Code" # Maximize window
py close_window.py "Calculator" # Close window
py get_active_window.py # Get title of active window
Advanced Actions (NEW!)
# Click by text (No coordinates needed!)
py click_text.py "Save" # Click "Save" button anywhere
py click_text.py "Submit" "Chrome" # Click "Submit" in Chrome only
# Drag and Drop
py drag.py 100 100 500 300 # Drag from (100,100) to (500,300)
# Robust Automation (Wait/Find)
py wait_for_text.py "Ready" "App" 30 # Wait up to 30s for text
py wait_for_window.py "Notepad" 10 # Wait for window to appear
py find_text.py "Login" "Chrome" # Get coordinates of text
py list_windows.py # List all open windows
Read Window Text
py read_window.py "Notepad" # Read all text from Notepad
py read_window.py "Visual Studio" # Read text from VS Code
py read_window.py "Chrome" # Read text from browser
Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!
Read UI Elements (NEW!)
py read_ui_elements.py "Chrome" # All interactive elements
py read_ui_elements.py "Chrome" --buttons-only # Just buttons
py read_ui_elements.py "Chrome" --links-only # Just links
py read_ui_elements.py "Chrome" --json # JSON output
Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.
Read Webpage Content (NEW!)
py read_webpage.py # Read active browser
py read_webpage.py "Chrome" # Target Chrome specifically
py read_webpage.py "Chrome" --buttons # Include buttons
py read_webpage.py "Chrome" --links # Include links with coords
py read_webpage.py "Chrome" --full # All elements (inputs, images)
py read_webpage.py "Chrome" --json # JSON output
Enhanced browser content extraction with headings, text, buttons, and links.
Handle Dialogs (NEW!)
# List all open dialogs
py handle_dialog.py list
# Read current dialog content
py handle_dialog.py read
py handle_dialog.py read --json
# Click button in dialog
py handle_dialog.py click "OK"
py handle_dialog.py click "Save"
py handle_dialog.py click "Yes"
# Type into dialog text field
py handle_dialog.py type "myfile.txt"
py handle_dialog.py type "C:\path\to\file" --field 0
# Dismiss dialog (auto-finds OK/Close/Cancel)
py handle_dialog.py dismiss
# Wait for dialog to appear
py handle_dialog.py wait --timeout 10
py handle_dialog.py wait "Save As" --timeout 5
Handles Save/Open dialogs, message boxes, alerts, confirmations, etc.
Click Element by Name (NEW!)
py click_element.py "Save" # Click "Save" anywhere
py click_element.py "OK" --window "Notepad" # In specific window
py click_element.py "Submit" --type Button # Only buttons
py click_element.py "File" --type MenuItem # Menu items
py click_element.py --list # List clickable elements
py click_element.py --list --window "Chrome" # List in specific window
Click buttons, links, menu items by name without needing coordinates.
Read Screen Region (OCR - Optional)
py read_region.py 100 100 500 300 # Read text from coordinates
Note: Requires Tesseract OCR installation. Use read_window.py instead for better results.
Workflow Pattern
- Read window - Extract text from specific window (fast, accurate)
- Read UI elements - Get buttons, links with coordinates
- Screenshot (if needed) - See visual layout
- Act - Click element by name or coordinates
- Handle dialogs - Interact with popups/save dialogs
- Read window - Verify changes
Screen Coordinates
- Origin (0, 0) is top-left corner
- Your screen: 2560x1440 (check with screenshot)
- Use coordinates from screenshot analysis
Examples
Open Notepad and type
# Press Windows key
py key_press.py "win"
# Type "notepad"
py type_text.py "notepad"
# Press Enter
py key_press.py "enter"
# Wait a moment, then type
py type_text.py "Hello from AI!"
# Save
py key_press.py "ctrl+s"
Click in VS Code
# Read current VS Code content
py read_window.py "Visual Studio Code"
# Click at specific location (e.g., file explorer)
py click.py 50 100
# Type filename
py type_text.py "test.js"
# Press Enter
py key_press.py "enter"
# Verify new file opened
py read_window.py "Visual Studio Code"
Monitor Notepad changes
# Read current content
py read_window.py "Notepad"
# User types something...
# Read updated content (no screenshot needed!)
py read_window.py "Notepad"
Text Reading Methods
Method 1: Windows UI Automation (BEST)
- Use
read_window.pyfor any window - Use
read_ui_elements.pyfor buttons/links with coordinates - Use
read_webpage.pyfor browser content with structure - Gets actual text data (not image-based)
Method 2: Click by Name (NEW)
- Use
click_element.pyto click buttons/links by name - No coordinates needed - finds elements automatically
- Works across all windows or target specific window
Method 3: Dialog Handling (NEW)
- Use
handle_dialog.pyfor popups, save dialogs, alerts - Read dialog content, click buttons, type text
- Auto-dismiss with common buttons (OK, Cancel, etc.)
Method 4: Screenshot + Vision (Fallback)
- Take full screenshot
- AI reads text visually
- Slower but works for any content
Method 5: OCR (Optional)
- Use
read_region.pywith Tesseract - Requires additional installation
- Good for images/PDFs with text
Safety Features
pyautogui.FAILSAFE = True(move mouse to top-left to abort)- Small delays between actions
- Smooth mouse movements (not instant jumps)
Requirements
- Python 3.11+
- pyautogui (installed ✅)
- pillow (installed ✅)
Tips
- Always screenshot first to see current state
- Coordinates are absolute (not relative to windows)
- Wait briefly after clicks for UI to update
- Use
ctrl+zfriendly actions when possible
Status: ✅ READY FOR USE (v2.0 - Dialog & UI Elements) Created: 2026-02-01 Updated: 2026-02-02
No README available.
Permissions & Security
Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.
Requirements
- Python 3.11+ - pyautogui (installed ✅) - pillow (installed ✅)
FAQ
How do I install windows-control?
Run openclaw add @spliff7777/windows-control in your terminal. This installs windows-control into your OpenClaw Skills catalog.
Does this skill run locally or in the cloud?
OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.
Where can I verify the source code?
The source repository is available at https://github.com/openclaw/skills/tree/main/skills/spliff7777/windows-control. Review commits and README documentation before installing.
