skills$openclaw/docstrange
shhdwi9.5k

by shhdwi

docstrange – OpenClaw Skill

docstrange is an OpenClaw Skills integration for data analytics workflows. Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.

9.5k stars7.1k forksSecurity L1
Updated Feb 7, 2026Created Feb 7, 2026data analytics

Skill Snapshot

namedocstrange
descriptionDocument extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data. OpenClaw Skills integration.
ownershhdwi
repositoryshhdwi/docstrange
languageMarkdown
licenseMIT
topics
securityL1
installopenclaw add @shhdwi/docstrange
last updatedFeb 7, 2026

Maintainer

shhdwi

shhdwi

Maintains docstrange in the OpenClaw Skills directory.

View GitHub profile
File Explorer
3 files
.
_meta.json
284 B
package.json
353 B
SKILL.md
6.1 KB
SKILL.md

name: docstrange description: Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.

DocStrange by Nanonets

Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.

Get your API key: https://docstrange.nanonets.com/app

Quick Start

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Response:

{
  "success": true,
  "record_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "result": {
    "markdown": {
      "content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
    }
  }
}

Setup

1. Get Your API Key

# Visit the dashboard
https://docstrange.nanonets.com/app

Save your API key:

export DOCSTRANGE_API_KEY="your_api_key_here"

2. OpenClaw Configuration (Optional)

Add to your ~/.openclaw/openclaw.json:

{
  skills: {
    entries: {
      "docstrange": {
        enabled: true,
        apiKey: "your_api_key_here",
        env: {
          DOCSTRANGE_API_KEY: "your_api_key_here",
        },
      },
    },
  },
}

Common Tasks

Extract to Markdown

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Access content: response["result"]["markdown"]["content"]

Extract JSON Fields

Simple field list:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
  -F "include_metadata=confidence_score"

With JSON schema:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'

Response with confidence scores:

{
  "result": {
    "json": {
      "content": {
        "invoice_number": "INV-2024-001",
        "total_amount": 500.00
      },
      "metadata": {
        "confidence_score": {
          "invoice_number": 98,
          "total_amount": 99
        }
      }
    }
  }
}

Extract Tables to CSV

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@table.pdf" \
  -F "output_format=csv" \
  -F "csv_options=table"

Async Extraction (Large Documents)

For documents >5 pages, use async and poll:

Queue the document:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@large-document.pdf" \
  -F "output_format=markdown"

# Returns: {"record_id": "12345", "status": "processing"}

Poll for results:

curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY"

# Returns: {"status": "completed", "result": {...}}

Advanced Features

Bounding Boxes

Get element coordinates for layout analysis:

-F "include_metadata=bounding_boxes"

Hierarchy Output

Extract document structure (sections, tables, key-value pairs):

-F "json_options=hierarchy_output"

Financial Documents Mode

Enhanced table and number formatting:

-F "markdown_options=financial-docs"

Custom Instructions

Guide extraction with prompts:

-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"

Multiple Formats

Request multiple formats in one call:

-F "output_format=markdown,json"

When to Use

Use DocStrange For:

  • Invoice and receipt processing
  • Contract text extraction
  • Bank statement parsing
  • Form digitization
  • Image OCR (scanned documents)

Don't Use For:

  • Documents >5 pages with sync (use async)
  • Video/audio transcription
  • Non-document images

Best Practices

Document SizeEndpointNotes
<=5 pages/extract/syncImmediate response
>5 pages/extract/asyncPoll for results

JSON Extraction:

  • Field list: ["field1", "field2"] — quick extractions
  • JSON schema: {"type": "object", ...} — strict typing, nested data

Confidence Scores:

  • Add include_metadata=confidence_score
  • Scores are 0-100 per field
  • Review fields <80 manually

Schema Templates

Invoice

{
  "type": "object",
  "properties": {
    "invoice_number": {"type": "string"},
    "date": {"type": "string"},
    "vendor": {"type": "string"},
    "total": {"type": "number"},
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": {"type": "string"},
          "quantity": {"type": "number"},
          "price": {"type": "number"}
        }
      }
    }
  }
}

Receipt

{
  "type": "object",
  "properties": {
    "merchant": {"type": "string"},
    "date": {"type": "string"},
    "total": {"type": "number"},
    "items": {
      "type": "array",
      "items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
    }
  }
}

Troubleshooting

400 Bad Request:

  • Provide exactly one input: file, file_url, or file_base64
  • Verify API key is valid

Sync Timeout:

  • Use async for documents >5 pages
  • Poll /extract/results/{record_id}

Missing Confidence Scores:

  • Requires json_options (field list or schema)
  • Add include_metadata=confidence_score

References

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

  • OpenClaw CLI installed and configured.
  • Language: Markdown
  • License: MIT
  • Topics:

Configuration

Add to your `~/.openclaw/openclaw.json`: ```json5 { skills: { entries: { "docstrange": { enabled: true, apiKey: "your_api_key_here", env: { DOCSTRANGE_API_KEY: "your_api_key_here", }, }, }, }, } ```

FAQ

How do I install docstrange?

Run openclaw add @shhdwi/docstrange in your terminal. This installs docstrange into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/shhdwi/docstrange. Review commits and README documentation before installing.