docs-visitor-cohort — Find the behavioral cohorts behind aggregate numbers
Page-level analytics tell you which page is doing badly. Cohort analysis tells you which kind of user is failing — and that's often what marketing/product actually need to know.
This skill takes the top ~20 most active anonymous visitors (get_top_visitors), pulls their full activity timeline (get_visitor_activity), and uses the LLM to cluster them into 3–6 named behavioral cohorts (e.g. buyer-blocker, mcp-debugger, tire-kicker). Each cohort becomes a finding with severity tied to how blocking the pattern is.
When to run#
- Monthly.
- After major launches — who arrived?
- Before a pricing change — what's the current top-user profile?
Workflow#
Standard four-stage docs-insights pipeline. Slice = cohort. The collector fans out: get_top_visitors then one get_visitor_activity per returned visitor_id (5 parallel).
- Collector:
COHORT_SIZEdefaults to20. Each visitor's timeline is captured (pageviews, cta_clicks, feedback). - Clusterer: LLM-clusters timelines by behavior pattern. Produces 3–6 cohorts.
- Reporter:
cohort_patternfinding per cluster. Severity by blocker score.
What this skill catches#
| Cohort label (example) | Pattern | Action |
|---|---|---|
buyer-blocker |
landing → quick-start → billing → 👎 on billing, no Upgrade click | add_to_todo + notify_slack — this is a sales-critical pattern |
mcp-debugger |
repeated visits to mcp.md and webhooks.md, no CTA hits | invoke_skill: docs-editor — likely missing examples |
deep-reader |
wide path coverage, long dwell, no negative signals | info — replicate what they read in onboarding |
tire-kicker |
many pages, 0 CTA, never returns | info — context, not problem |
Guardrails#
- PRO+ only (
get_top_visitors,get_visitor_activity). - Privacy:
visitor_idis a random anonymous ID; report MAY include up to 20 of them insamplesfor downstream debugging. NEVER includeuser_agent, IPs, or referrer query strings. - Minimum 10 visitors needed to attempt clustering; below that, exit with
no_data. - Cohort labels must be descriptive — never numeric. Use lowercase-kebab-case.
- LLM-clustering with < 10 visitors is unreliable; the clusterer sets
confidence: 0.5in that case.
Output for downstream consumption#
Each cohort finding's suggested_actions[] is mapped to:
add_to_todofor behavioral patterns that need product/marketing/sales input.notify_slackif a blocker cohort exceeds 30% of the top visitors (escalation worth a human's attention).invoke_skill: docs-editoronly when the cohort's drop page is clearly a documentation problem (e.g.mcp-debuggercohort dropping onmcp.md).
Acceptance criteria#
Same shape as docs-utm-analyzer. Cohort labels are present in findings[].title.
Arguments#
| Argument | Type | Default | Description |
|---|---|---|---|
workspace |
string | required | id or owner/repo |
period |
string | 30d |
30d is the working default |
cohort_size |
number | 20 |
How many top visitors to drill into |
Related skills#
docs-funnel-mapper— aggregate journeysdocs-engagement-analyzer— page-level dwelldocs-utm-analyzer— entry-side, often correlates with cohort