docs-visitor-cohort — Find the behavioral cohorts behind aggregate numbers

Page-level analytics tell you which page is doing badly. Cohort analysis tells you which kind of user is failing — and that's often what marketing/product actually need to know.

This skill takes the top ~20 most active anonymous visitors (get_top_visitors), pulls their full activity timeline (get_visitor_activity), and uses the LLM to cluster them into 3–6 named behavioral cohorts (e.g. buyer-blocker, mcp-debugger, tire-kicker). Each cohort becomes a finding with severity tied to how blocking the pattern is.

When to run#

Monthly.
After major launches — who arrived?
Before a pricing change — what's the current top-user profile?

Workflow#

Standard four-stage docs-insights pipeline. Slice = cohort. The collector fans out: get_top_visitors then one get_visitor_activity per returned visitor_id (5 parallel).

Collector: COHORT_SIZE defaults to 20. Each visitor's timeline is captured (pageviews, cta_clicks, feedback).
Clusterer: LLM-clusters timelines by behavior pattern. Produces 3–6 cohorts.
Reporter: cohort_pattern finding per cluster. Severity by blocker score.

What this skill catches#

Cohort label (example)	Pattern	Action
`buyer-blocker`	landing → quick-start → billing → 👎 on billing, no Upgrade click	`add_to_todo` + `notify_slack` — this is a sales-critical pattern
`mcp-debugger`	repeated visits to mcp.md and webhooks.md, no CTA hits	`invoke_skill: docs-editor` — likely missing examples
`deep-reader`	wide path coverage, long dwell, no negative signals	`info` — replicate what they read in onboarding
`tire-kicker`	many pages, 0 CTA, never returns	`info` — context, not problem

Guardrails#

PRO+ only (get_top_visitors, get_visitor_activity).
Privacy: visitor_id is a random anonymous ID; report MAY include up to 20 of them in samples for downstream debugging. NEVER include user_agent, IPs, or referrer query strings.
Minimum 10 visitors needed to attempt clustering; below that, exit with no_data.
Cohort labels must be descriptive — never numeric. Use lowercase-kebab-case.
LLM-clustering with < 10 visitors is unreliable; the clusterer sets confidence: 0.5 in that case.

Output for downstream consumption#

Each cohort finding's suggested_actions[] is mapped to:

add_to_todo for behavioral patterns that need product/marketing/sales input.
notify_slack if a blocker cohort exceeds 30% of the top visitors (escalation worth a human's attention).
invoke_skill: docs-editor only when the cohort's drop page is clearly a documentation problem (e.g. mcp-debugger cohort dropping on mcp.md).

Acceptance criteria#

Same shape as docs-utm-analyzer. Cohort labels are present in findings[].title.

Arguments#

Argument	Type	Default	Description
`workspace`	string	required	id or `owner/repo`
`period`	string	`30d`	`30d` is the working default
`cohort_size`	number	`20`	How many top visitors to drill into

docs-funnel-mapper — aggregate journeys
docs-engagement-analyzer — page-level dwell
docs-utm-analyzer — entry-side, often correlates with cohort

docs-visitor-cohort