docs-from-site
Build documentation from a website URL. Crawl the site, extract content, and produce structured Markdown files ready for Docsbook or GitHub.
Arguments#
$ARGUMENTS[0]— website URL (required)$ARGUMENTS[1]— output name (optional; derived from the domain if not provided)
If no URL is provided, ask for it before proceeding.
Step 1 — Branding extraction#
Fetch the homepage HTML using WebFetch. Only fall back to Chrome if WebFetch fails.
From <head>, extract:
<title>— site name<meta name="description">— tagline<link rel="icon">— favicon URL<meta property="og:image">— OpenGraph image
From inline CSS and <style> blocks, extract color tokens using regex:
--primary,--color-primary,--accent,--background,--foreground- Button color from
.btn,buttoninline styles
Determine the detected color scheme:
- Parse the
--backgroundorbackground-colorvalue - If luminance > 50%, scheme is
"light"; otherwise"dark"
Check for a theme toggle by scanning for data-theme-toggle, [class*="theme-toggle"], or similar attributes.
Step 2 — Content discovery#
Find all pages to crawl:
- Fetch
/sitemap.xml— parse all<loc>URLs - Collect
<a href>links from the homepage (same domain only) - Check standard paths:
- Documentation first:
/docs,/docs/getting-started,/help,/guides,/tutorials - Product:
/features,/pricing,/about - Technical:
/api,/integrations - Reference:
/faq,/changelog
- Documentation first:
Crawl in that priority order. Skip: /login, /signup, /auth, /checkout, /cart.
Step 3 — Content extraction#
If scripts/crawl-site.js is available locally, run:
node scripts/crawl-site.js "$URL" "docs-output/$NAME" \
--max-pages=50 \
--priority-paths=docs,help,guides,tutorials \
--skip-paths=login,signup,auth,checkout,cartIf the script is not available, fetch each discovered URL manually and convert HTML to Markdown:
- Extract content from
<main>,<article>, or.content - Skip
<header>,<footer>,<nav>,<aside>
Step 4 — Structure organization#
Organize the extracted files into this layout (skip empty sections):
docs-output/<name>/
├── README.md # Intro: what the product does, key value props
├── getting-started/
│ ├── README.md # Quick start — the most important page
│ └── installation.md # If applicable
├── features/
│ └── <feature-name>.md # One file per major feature
├── guides/
│ └── <guide-name>.md # How-to guides
├── api/
│ └── reference.md # If API docs were found
└── faq.md # If a FAQ was found
Write each page following the shared writing rules:
- Active voice, second person ("you"), imperative mood for instructions
- Sentence case headings ("How to configure X")
- No filler words: remove "simply", "just", "easily", "powerful"
- All code blocks must have a language specifier
- Link related pages using relative internal links
Step 5 — Branding JSON#
Write a _branding.json file in docs-output/<name>/:
{
"accentColor": "#xxx",
"background": "#xxx",
"foreground": "#xxx",
"favicon": "https://...",
"hasThemeToggle": true,
"detectedScheme": "light"
}This file is consumed by /docs-setup-workspace when configuring Docsbook.
Output#
Print the created directory tree. Then suggest next steps:
Docs written to docs-output/<name>/
Next steps:
/docs-publish — push to GitHub
/docs-setup-workspace — configure Docsbook workspace