DocsbookDocsbook
← Back to catalog
creation

docs-from-site

Builds documentation from a website URL. Crawls the site, extracts content, and produces structured Markdown files in docs-output/<name>/. Use when you have a product URL and want to create documentation from it. Use /docs-create if you want the full pipeline (crawl + publish + Docsbook setup).

Local install
npx docs-skills install
Try in MCP
@docsbook find_skill "docs-from-site"

docs-from-site

Build documentation from a website URL. Crawl the site, extract content, and produce structured Markdown files ready for Docsbook or GitHub.

Arguments#

If no URL is provided, ask for it before proceeding.

Step 1 — Branding extraction#

Fetch the homepage HTML using WebFetch. Only fall back to Chrome if WebFetch fails.

From <head>, extract:

From inline CSS and <style> blocks, extract color tokens using regex:

Determine the detected color scheme:

Check for a theme toggle by scanning for data-theme-toggle, [class*="theme-toggle"], or similar attributes.

Step 2 — Content discovery#

Find all pages to crawl:

  1. Fetch /sitemap.xml — parse all <loc> URLs
  2. Collect <a href> links from the homepage (same domain only)
  3. Check standard paths:
    • Documentation first: /docs, /docs/getting-started, /help, /guides, /tutorials
    • Product: /features, /pricing, /about
    • Technical: /api, /integrations
    • Reference: /faq, /changelog

Crawl in that priority order. Skip: /login, /signup, /auth, /checkout, /cart.

Step 3 — Content extraction#

If scripts/crawl-site.js is available locally, run:

node scripts/crawl-site.js "$URL" "docs-output/$NAME" \
  --max-pages=50 \
  --priority-paths=docs,help,guides,tutorials \
  --skip-paths=login,signup,auth,checkout,cart

If the script is not available, fetch each discovered URL manually and convert HTML to Markdown:

Step 4 — Structure organization#

Organize the extracted files into this layout (skip empty sections):

docs-output/<name>/
├── README.md                    # Intro: what the product does, key value props
├── getting-started/
│   ├── README.md                # Quick start — the most important page
│   └── installation.md          # If applicable
├── features/
│   └── <feature-name>.md        # One file per major feature
├── guides/
│   └── <guide-name>.md          # How-to guides
├── api/
│   └── reference.md             # If API docs were found
└── faq.md                       # If a FAQ was found

Write each page following the shared writing rules:

Step 5 — Branding JSON#

Write a _branding.json file in docs-output/<name>/:

{
  "accentColor": "#xxx",
  "background": "#xxx",
  "foreground": "#xxx",
  "favicon": "https://...",
  "hasThemeToggle": true,
  "detectedScheme": "light"
}

This file is consumed by /docs-setup-workspace when configuring Docsbook.

Output#

Print the created directory tree. Then suggest next steps:

Docs written to docs-output/<name>/

Next steps:
  /docs-publish    — push to GitHub
  /docs-setup-workspace — configure Docsbook workspace
View source on GitHub →Browse full catalog repo →
Keywords
crawlwebsitesiteextractmarkdowngenerate