skills$openclaw/pymupdf-pdf

1.1k★

pymupdf-pdf – OpenClaw Skill

Name: pymupdf-pdf
Author: bsinriclawd

pymupdf-pdf is an OpenClaw Skills integration for coding workflows. Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

1.1k stars3.7k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	pymupdf-pdf
description	Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders. OpenClaw Skills integration.
owner	bsinriclawd
repository	bsinriclawd/hvac-estimate-takeoffpath: pymupdf-pdf-parser-clawdbot-skill
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @bsinriclawd/hvac-estimate-takeoff:pymupdf-pdf-parser-clawdbot-skill
last updated	Feb 7, 2026

Maintainer

bsinriclawd

Maintains pymupdf-pdf in the OpenClaw Skills directory.

View GitHub profile

File Explorer

6 files

pymupdf-pdf-parser-clawdbot-skill

references

pymupdf-notes.md

595 B

scripts

pymupdf_parse.py

3.4 KB

README.md

3.4 KB

SKILL.md

1.4 KB

SKILL.md

name: pymupdf-pdf description: Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

--format md|json|both (default: md)
--images to extract images
--tables to extract a simple line-based table JSON (quick/rough)
--outroot DIR to change output root
--lang adds a language hint into JSON output metadata

Output conventions

Create ./pymupdf-output/<pdf-basename>/ by default.
Markdown output: output.md
JSON output: output.json (includes lang)
Images: images/ subdir
Tables: tables.json (rough line-based)

Notes

PyMuPDF is fast but less robust on complex PDFs.
For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

README.md

PyMuPDF PDF Parser - Clawdbot Skill

A Clawdbot skill for fast, lightweight PDF parsing using PyMuPDF (fitz). Ideal for quick text extraction when speed matters.

Features

Fast processing — Parses PDFs in ~1 second per page
Lightweight — Single pip dependency, no heavy models
Markdown output — Clean text extraction with page markers
JSON output — Simple structured text per page
Image extraction — Optional embedded image extraction
NixOS compatible — Includes notes for libstdc++ issues

Installation

Prerequisites

Python 3.8+
PyMuPDF: pip install pymupdf
Clawdbot installed

Install the skill

# Clone the repo
git clone https://github.com/kesslerio/PyMuPDF-PDF-Parser-Clawdbot-Skill.git

# Or copy the pymupdf-pdf/ folder to your Clawdbot skills directory
cp -r PyMuPDF-PDF-Parser-Clawdbot-Skill/pymupdf-pdf ~/.clawdbot/skills/

# Install dependency
pip install pymupdf

NixOS users

If you hit libstdc++ import errors:

export LD_LIBRARY_PATH=/nix/store/<your-gcc-lib-path>/lib

See pymupdf-pdf/references/pymupdf-notes.md for details.

Usage

Quick start

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/document.pdf

Options

./scripts/pymupdf_parse.py /path/to/document.pdf --format json
./scripts/pymupdf_parse.py /path/to/document.pdf --format both --images
./scripts/pymupdf_parse.py /path/to/document.pdf --outroot ./my-output

Option	Default	Description
`--format`	`md`	Output format: `md`, `json`, or `both`
`--outroot`	`./pymupdf-output`	Output root directory
`--images`	off	Extract embedded images
`--tables`	off	Extract line-based table approximation
`--lang`	`en`	Language hint (stored in JSON metadata)

Output

Creates a per-document folder under the output root:

./pymupdf-output/
└── document-name/
    ├── output.md      # Markdown with page markers
    ├── output.json    # Simple JSON (~1KB, text per page)
    ├── images/        # Extracted images (if --images)
    └── tables.json    # Line-based tables (if --tables)

Output quality

PyMuPDF produces fast, minimal output:

Plain text extraction (no layout preservation)
Simple JSON with text per page
Optional image extraction

Best for: Quick text extraction, batch processing, or when speed matters.

Comparison with MinerU

Aspect	PyMuPDF	MinerU
Speed	Fast (~1s/page)	Slower (~15-30s/page)
JSON output	Minimal (~1KB, text only)	Rich (~50KB+, layout data)
Image extraction	Optional	Automatic
Layout preservation	Basic	Excellent
Dependencies	Light (pip install)	Heavy (~20GB models)

Use PyMuPDF when: Speed matters or for simple text extraction.
Use MinerU when: Quality and structure matter more than speed.

License

Apache 2.0

Contributing

Issues and PRs welcome. Please test changes with various PDF types before submitting.

MinerU PDF Parser Skill — Rich, layout-aware alternative
PyMuPDF — The underlying PDF library
Clawdbot — The AI agent framework

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

OpenClaw CLI installed and configured.
Language: Markdown
License: MIT
Topics:

FAQ

How do I install pymupdf-pdf?

Run openclaw add @bsinriclawd/hvac-estimate-takeoff:pymupdf-pdf-parser-clawdbot-skill in your terminal. This installs pymupdf-pdf into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/bsinriclawd/hvac-estimate-takeoff. Review commits and README documentation before installing.

pymupdf-pdf – OpenClaw Skill

Skill Snapshot

Maintainer

name: pymupdf-pdf description: Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

PyMuPDF PDF

Overview

Prereqs / when to read references

Quick start (single PDF)

Options

Output conventions

Notes

PyMuPDF PDF Parser - Clawdbot Skill

Features

Installation

Prerequisites

Install the skill

NixOS users

Usage

Quick start

Options

Output

Output quality

Comparison with MinerU

License

Contributing

Related

Permissions & Security

Requirements

FAQ

How do I install pymupdf-pdf?

Does this skill run locally or in the cloud?

Where can I verify the source code?