skills$openclaw/xiaohongshu-scraper

9.1k★

xiaohongshu-scraper – OpenClaw Skill

Name: xiaohongshu-scraper
Author: ty-teo

xiaohongshu-scraper is an OpenClaw Skills integration for coding workflows. 小红书内容爬取和整理。用于搜索小红书笔记、提取详细内容（正文、评论、图片）、生成整理好的 Markdown 文档。当用户要求搜索小红书、查找小红书攻略、整理小红书内容时使用。

9.1k stars8.6k forksSecurity L1

Updated Feb 7, 2026Created Feb 7, 2026coding

Skill Snapshot

name	xiaohongshu-scraper
description	小红书内容爬取和整理。用于搜索小红书笔记、提取详细内容（正文、评论、图片）、生成整理好的 Markdown 文档。当用户要求搜索小红书、查找小红书攻略、整理小红书内容时使用。 OpenClaw Skills integration.
owner	ty-teo
repository	ty-teo/xiaohongshu-scraper
language	Markdown
license	MIT
topics
security	L1
install	openclaw add @ty-teo/xiaohongshu-scraper
last updated	Feb 7, 2026

Maintainer

ty-teo

Maintains xiaohongshu-scraper in the OpenClaw Skills directory.

View GitHub profile

File Explorer

18 files

scripts

legacy

extract_chrome_cookies.py

4.9 KB

fetch_note_api.py

14.7 KB

fetch_note_chrome.py

10.0 KB

fetch_note_v2.py

11.0 KB

fetch_note_v3.py

13.2 KB

fetch_note.py

8.5 KB

2.9 KB

scrape.py

5.7 KB

xhs_download.py

7.5 KB

xhs_fetch_stealth.py

12.8 KB

xhs_fetch.py

13.3 KB

xhs_api_client.py

2.7 KB

xhs_scraper.py

9.0 KB

xhs-api-service.sh

2.2 KB

_meta.json

290 B

SKILL.md

5.5 KB

SKILL.md

name: xiaohongshu-scraper version: 1.0.0 description: 小红书内容爬取和整理。用于搜索小红书笔记、提取详细内容（正文、评论、图片）、生成整理好的 Markdown 文档。当用户要求搜索小红书、查找小红书攻略、整理小红书内容时使用。

小红书内容爬取

基于 XHS-Downloader 的小红书笔记抓取工具。

快速使用

1. 启动 API 服务（首次使用或服务未运行时）

cd /Users/lixiaoji/clawd/skills/xiaohongshu-scraper/scripts
./xhs-api-service.sh start

2. 完整抓取并保存（推荐）

# 抓取笔记，下载图片，OCR识别，保存到指定文件夹
python xhs_scraper.py "笔记URL" --output /tmp/xhs_note

# 批量抓取多个链接
python xhs_scraper.py "URL1" "URL2" "URL3" --output /tmp/xhs_notes

# 不进行 OCR（更快）
python xhs_scraper.py "笔记URL" --output /tmp/xhs_note --no-ocr

# 仅获取信息不下载
python xhs_scraper.py "笔记URL" --info-only

# 输出 JSON 格式
python xhs_scraper.py "笔记URL" --json

3. 支持的链接格式

https://www.xiaohongshu.com/explore/作品ID?xsec_token=XXX
https://www.xiaohongshu.com/discovery/item/作品ID?xsec_token=XXX
https://www.xiaohongshu.com/user/profile/作者ID/作品ID?xsec_token=XXX
https://xhslink.com/分享码（短链接）
支持一次输入多个链接，空格分隔

服务管理

# 启动服务
./xhs-api-service.sh start

# 停止服务
./xhs-api-service.sh stop

# 重启服务
./xhs-api-service.sh restart

# 查看状态
./xhs-api-service.sh status

API 直接调用

服务运行后，可以直接调用 API：

# 获取笔记信息
curl -X POST http://127.0.0.1:5556/xhs/detail \
  -H "Content-Type: application/json" \
  -d '{"url": "笔记链接", "download": true}'

API 文档：http://127.0.0.1:5556/docs

输出文件结构

output_dir/
├── 作品ID/
│   ├── note.json           # 结构化数据（完整信息）
│   ├── note.md             # Markdown 文档
│   ├── images/             # 下载的图片
│   │   ├── 01.jpeg
│   │   ├── 02.jpeg
│   │   └── ...
│   └── ocr/                # OCR 识别结果
│       ├── 01.md           # 每张图片对应的 OCR 文本
│       ├── 02.md
│       └── ...

note.json 格式

{
  "note_id": "笔记ID",
  "fetch_time": "抓取时间",
  "title": "标题",
  "desc": "描述/正文",
  "type": "作品类型（图文/视频）",
  "author": {
    "nickname": "作者昵称",
    "user_id": "作者ID",
    "profile_url": "作者主页"
  },
  "interact": {
    "liked_count": 123,
    "collected_count": 456,
    "comment_count": 78,
    "share_count": 9
  },
  "tags": ["标签1", "标签2"],
  "publish_time": "发布时间",
  "last_update_time": "最后更新时间",
  "url": "笔记链接",
  "download_urls": ["下载地址列表"],
  "local_files": ["本地文件路径"],
  "ocr_results": [
    {"image": "01.jpeg", "text": "OCR识别的文字"}
  ]
}

Python 调用示例

import subprocess
import json
from pathlib import Path

def scrape_xhs_note(url: str, output_dir: str) -> dict:
    """抓取小红书笔记"""
    script = "/Users/lixiaoji/clawd/skills/xiaohongshu-scraper/scripts/xhs_scraper.py"
    result = subprocess.run(
        ["python", script, url, "--output", output_dir, "--json"],
        capture_output=True,
        text=True
    )
    if result.returncode == 0:
        return json.loads(result.stdout)
    else:
        raise Exception(result.stderr)

# 使用示例
data = scrape_xhs_note("https://www.xiaohongshu.com/explore/xxx", "/tmp/xhs")
print(data["title"])

脚本列表

脚本	用途
`xhs_scraper.py`	完整抓取工具（推荐）：下载+OCR+保存
`xhs_api_client.py`	简单客户端：仅获取信息
`xhs-api-service.sh`	API 服务管理脚本
`xhs_download.py`	直接调用源码下载（无需 API）

依赖

XHS-Downloader（已安装在 /Users/lixiaoji/Downloads/XHS-Downloader-master-2）
Python 3.12+
requests

配置路径

项目	路径
XHS-Downloader 源码	`/Users/lixiaoji/Downloads/XHS-Downloader-master-2`
默认下载目录	`/Users/lixiaoji/Downloads/XHS-Downloader_V2/_internal/Volume/Download`
默认输出目录	`/Users/lixiaoji/clawd/data/xhs`
API 日志	`/tmp/xhs-downloader-api.log`

注意事项

首次使用需要先启动 API 服务
下载的文件默认保存在 XHS-Downloader 目录
使用 --output 参数将文件整理保存到指定目录
OCR 功能使用 macOS 内置的 Vision 框架，仅支持 macOS
某些笔记可能有访问限制（作者设置、被删除等）
短链接需要网络请求解析，确保网络通畅

故障排除

API 服务无法启动

检查端口 5556 是否被占用：lsof -i :5556
查看日志：cat /tmp/xhs-downloader-api.log

手动启动测试：

cd /Users/lixiaoji/Downloads/XHS-Downloader-master-2
source venv/bin/activate
python main.py api

获取数据失败

检查链接格式是否正确
某些笔记可能有访问限制（作者设置、被删除等）
尝试在浏览器中打开链接确认笔记是否可访问

找不到下载的文件

检查下载目录是否有新文件
文件名格式为：发布时间_作者昵称_标题_序号.扩展名
等待下载完成后再查找

README.md

No README available.

Permissions & Security

Security level L1: Low-risk skills with minimal permissions. Review inputs and outputs before running in production.

Requirements

OpenClaw CLI installed and configured.
Language: Markdown
License: MIT
Topics:

FAQ

How do I install xiaohongshu-scraper?

Run openclaw add @ty-teo/xiaohongshu-scraper in your terminal. This installs xiaohongshu-scraper into your OpenClaw Skills catalog.

Does this skill run locally or in the cloud?

OpenClaw Skills execute locally by default. Review the SKILL.md and permissions before running any skill.

Where can I verify the source code?

The source repository is available at https://github.com/openclaw/skills/tree/main/skills/ty-teo/xiaohongshu-scraper. Review commits and README documentation before installing.