snappy-os — the full stack, named honestly

This is the total description. Every layer has one on-disk home. If a word here isn't backed by a file path, that's a bug in this doc — flag it and fix it. Written 2026-04-22 after repeated drift into taxonomy words that didn't match the code.

Ground rules for this document:

Every primitive is the word the code uses for itself (checked against headers in state/lib/*.ts, program.md, and state/index.md).
No layer is "harness" or "plugin" generically — every layer has a concrete file.
Where a thing is enforced by a script, the script is named.
Where a thing is convention-only (not enforced), that's called out.
snappy-os is not a plugin. It is a scaffolded tree that lives in the user's git. Distribution shape is Option A (full tree) / Option B (plain skills only) — see bottom.

1. The atom: one skill

A skill has four parts on disk. They support each other. Any one can be absent if the skill doesn't need it — a pure-prose skill is fine with just the .md. A skill that calls APIs needs the .ts. A skill that loads every turn needs the .agents.md.

1a. `state/skills/<name>.md` — the Skill (Anthropic primitive)

The canonical file. Matches the Anthropic Agent Skills spec: YAML frontmatter (name, description, optional category, triggers, etc.) + progressive-disclosure body. This is the file a user could copy out of snappy-os and drop into any Anthropic-spec consumer (Claude Code, agentskills.io reader, Cloudflare Marketplace).

In snappy-os the folder-based SKILL.md layout is flattened to one file per skill (see state/index.md). 135 of these currently. This is the portable atom.

1b. `state/lib/<name>.ts` — `api.ts` (the library module)

This is the TypeScript module the skill's prose refers to when it says "run npx tsx state/lib/<name>.ts <verb>." Internally, every one of these files headers itself as snappy-<name>/api.ts. Example, verbatim from the tree:

state/lib/skill.ts header: snappy-skill/api.ts — Meta-skill scaffolder for the Snappy namespace.
state/lib/mine.ts header: snappy-mine/api.ts -- Content mining operations for all snappy-* skills.
state/lib/docs.ts header: snappy-docs/api.ts -- Notion REST API operations for all snappy-* skills.
state/lib/website.ts header: snappy-website/api.ts -- snappy.ai site health and deploy operations for all snappy-* skills.
state/lib/desktop.ts header: snappy-desktop/api.ts -- macOS desktop automation via Midscene vision AI.

So the word the code uses for itself is api.ts. The filename on disk is the skill slug (docs.ts, mine.ts) because a flat state/lib/ directory is easier to scan than state/lib/docs/api.ts × 101 directories. But inside, each file:

Starts with a #!/usr/bin/env npx tsx shebang.
Exports importable TypeScript functions (so another api.ts can import { foo } from "../lib/docs.ts").
Also carries a CLI dispatch at the bottom (so a user can run npx tsx state/lib/docs.ts search "meeting notes" — the usage strings inside the file literally read npx tsx api.ts search "meeting notes", because inside the file's own mental model, it is api.ts).

101 api.ts modules currently in state/lib/. Includes substrate modules (eval.ts, env.ts, log.ts, dispatch.ts, agents.ts) and per-skill modules (docs.ts, mine.ts, ffmpeg.ts, drive.ts, freshbooks.ts, ...).

When you read api.ts in any snappy-os doc or message, it means: state/lib/<slug>.ts, the TypeScript library module for that skill, shebanged and dual-purpose (importable AND runnable). Not a "runner," not a "service layer," not a "sidecar" in the abstract. api.ts.

1c. `state/bin/<name>` or `state/bin/<name>/*` — scripts

Separate from api.ts. Where api.ts is a library (TypeScript, importable, internally dispatches CLI verbs), state/bin/ is where shell scripts and standalone scripts live. 64 entries currently. Two shapes:

Single-file scripts: state/bin/commit-report.ts, state/bin/pid-detect.ts, state/bin/control.sh, state/bin/health.sh. Direct executables. Usually shebanged. Called by name.
Multi-file bundles: state/bin/agents/ (contains ctl.ts, go.sh, pause.sh, resume.sh, stop.sh, tick.sh, list.sh, dispatch-tick.ts), state/bin/autopilot/ (break.sh, fix.sh, open-count.sh, regen.sh), state/bin/brain/ (capabilities.ts, growth.ts, insights.ts, showcase.ts, insights-cron.sh), state/bin/inbox-sweep/, etc. Sub-dir per skill when one motor isn't enough — e.g., a skill with pause/resume/cron subcommands.

Why two places? state/lib/<name>.ts is for things meant to be imported by other api.ts modules. state/bin/<name>/* is for operational scripts — one-shots, crons, glue shell. Same skill can have both: api.ts in lib for the library surface, state/bin/<skill>/ for the ops scripts that call it.

This is the split the prose-sidecar-drift lint enforces: a skill with ≥3 executable commands in prose needs either state/bin/<slug>/* or state/lib/<slug>.ts. Not both required; at least one.

1d. `state/skills/<name>.agents.md` — the loader

Short per-turn context the agent sees when the skill is relevant. Inspired by Vercel's AGENTS.md pattern (the Dec 2025 Vercel blog post: "always-loaded AGENTS.md outperforms on-demand skills, 100% vs 79% pass rate"). snappy-os adapts this by making the loader keyword-gated (only injected when the agent's prompt touches the skill's triggers) instead of always-on. A skill without an .agents.md just doesn't get preloaded — the agent has to read the full .md to learn it.

The loader is the short path. The .md is the long path. Same content distilled.

1e. summary table for one skill

Part	Path	What it is	Used by
Skill (the atom)	`state/skills/<name>.md`	Anthropic-spec markdown + YAML frontmatter	Any Anthropic-compatible reader; the long path
api.ts	`state/lib/<name>.ts`	TypeScript library module with shebang, exports + CLI dispatch	Other api.ts imports; `npx tsx` CLI
scripts	`state/bin/<name>/*` or `state/bin/<name>.{sh,ts}`	Shell / standalone scripts for ops	Crons, hooks, user invocations
Loader	`state/skills/<name>.agents.md`	Short keyword-gated per-turn context	The `snappy-os-inject.sh` hook

Absent layers are fine. A PID-rule skill might have only .md + .agents.md. A pure library skill might have only .md + api.ts.

2. The cross-cutting parts (same across all skills)

2a. `state/regen/footer.md` — the self-correction rule

One file. Appended to every regenerated loader. Reads (verbatim excerpt):

1. Fix gaps (P — proportional). MANDATORY. If this loader didn't cover your case — if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk — you MUST attempt an Edit to this .agents.md before you log. Only LOGGED is allowed when: the fix needs >10 lines, spans multiple files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously. 2. Log the result. echo "[$(date -u +%FT%TZ)] <skill-name>: <what was missing or fixed> [FIXED|LOGGED]" >> state/log/agents-md-feedback.log

This is the PID loop's setpoint. The loader is the setpoint, the agent is the sensor, the gap is the error signal, closing the gap is the correction. Ported from snappy-kernel's dom-cartographer skill where it demonstrably worked (that skill mapped Skool admin UI once; every later ship-agent inherited the map for free).

2b. `state/hooks/*` — the wiring

12 shell scripts. These are what make the loaders fire. The entry point is state/hooks/snappy-os-inject.sh — the hook body. Claude Code and Codex each have their own wiring:

Claude Code wires snappy-os-inject.sh into UserPromptSubmit and PreToolUse:Task|Agent (via ~/.claude/settings.json). Fires on every prompt and every subagent dispatch.
Codex wires equivalent behavior via ~/.codex/hooks.json using SessionStart + Stop only. Not UserPromptSubmit — in Codex that produces a visible repeated block for the user (measured 2026-04-18).
openclaw / Gemini CLI / Cursor / Windsurf currently get context-file sync (GEMINI.md / AGENTS.md parity files written by bin/cli.js push) but no execution-time hook. "Context-only" in the parity measurements.

Other hooks in state/hooks/:

snappy-os-stop.sh — fires on Stop event, drains regen queue.
snappy-os-auto-regen.sh — bridges the regen queue to a headless Claude invocation.
preload-skill-context.sh / preload-skill-context-user.sh — mirror of inject, wired at the per-machine user settings level.
enqueue-skill-regen.sh — PostToolUse:Edit|Write on SKILL.md, enqueues the slug for regen.
drain-skill-regen.sh — Stop hook reads the queue and rewrites loaders.
auto-regen-skills.sh — shim installed by snappy-skills init.
collect-pid-status.sh, detect-pid-trends.sh, skill-check-session.sh — telemetry + pre-ship lint on Stop.

Parity is measured, not claimed. 62 rows in state/log/parity.ndjson as of 2026-04-18:

Claude Code: agentic, hook-wired, mean 0.69 across 16 runs.
Codex: agentic, hook-wired, mean 0.21 across 14 runs (climbing).
openclaw: context-only, mean 0.24 across 21 runs.
Gemini CLI: context-only, mean 0.00 across 11 runs.

npx tsx state/lint/parity-test.ts refreshes.

3. Feedback ledgers (write side of the PID loop)

Every time a skill runs, it emits one row — either an eval or a friction, never both, per run. These rows are feedback for the next agent. They are not a dashboard. program.md §6 is explicit about this: "The row is feedback for the next agent, not a dashboard."

3a. `state/log/evals.ndjson` — eval rows

Written by score() in state/lib/eval.ts. Shape, per row:

skill, score (0.0 / 0.5 / 1.0 — no floats between), actor_session_id,
auditor_session_id, ts, run_id, primary_issue?, writer_id?, touched?, ...

Read back by:

tailRecentEvals(skill, n) — the UserPromptSubmit hook uses this to surface "recent trouble:" in the injected loader.
pid-detect (state/bin/pid-detect.ts) — trend computation.
the landing-page aggregator — for public stats.

3b. `state/log/frictions.ndjson` — friction rows

Failure rows. Written by friction() in state/lib/eval.ts. A row carries area, severity (P0/P1/P2), surface (file path the gap was found in), expected, actual, repro command, and fix. The prose-sidecar-drift lint writes friction rows here; so do the other lints.

3c. `state/log/diagnostics.ndjson` — quarantined evals

Skills listed in state/lib/eval-quarantine.json are diagnostic harnesses (contract-test, view-toggle stub) — they emit fixed-shape rows that would pollute real trends. score() auto-routes them here so evals.ndjson stays a real-skill signal. (Audit P1-8, 2026-04-19.)

3d. `state/log/agents-md-feedback.log` — the LOGGED stream

The third line of the PID loop. When a gap is too big to fix inline (>10 lines, spans multiple files), the agent appends [LOGGED] here; the drain hook reads it next Stop and regenerates the loader from scratch.

4. The PID loop — three honest modes per run

For every skill turn:

[FIXED] — inline edit. The agent hit a gap ≤10 lines. It edits the .agents.md directly (one line in a table, a missing example) and appends [FIXED] to state/log/agents-md-feedback.log. Surgical. No restructuring.
[LOGGED] — queue for drain. The gap is too big for inline. The agent appends [LOGGED] with the description to agents-md-feedback.log. Also, if a SKILL.md was edited, PostToolUse:Edit|Write fires enqueue-skill-regen.sh, which writes the slug to state/log/regen-pending.txt.
Drain on Stop. snappy-os-stop.sh + drain-skill-regen.sh run at end of turn (or snappy-os-auto-regen.sh in autopilot mode). They read the queue and the LOGGED lines, dispatch a headless Claude to rewrite affected .agents.md files, clear the queue.

No fourth mode. "Every run scores itself" would be a fourth mode that does not exist. Scoring is a separate subprocess (see §5), not self-report.

5. Actor ≠ auditor — how it's enforced

This is the one rule that has to be right. program.md §5: "The thing that generates output cannot be the thing that grades it."

5a. The identity

sessionId() in state/lib/eval.ts:

export function sessionId(): string {
  if (cachedSessionId !== null) return cachedSessionId;
  const rnd = Math.random().toString(36).slice(2, 8);
  cachedSessionId = `s-${process.pid}-${rnd}`;
  return cachedSessionId;
}

The id is s-<process.pid>-<6 random chars>, cached per process. A different OS process gets a different pid gets a different id. An agent turn that generates output is one pid. The lint/check subprocess spawned afterward to grade that output is a different pid. The ids differ because the pids differ.

5b. The cutoff and the check

ACTOR_AUDITOR_REQUIRED_AFTER = "2026-04-20T18:00:00Z". Rows written after this instant must carry actor_session_id and auditor_session_id, and the two values must differ.

Enforcement is state/lint/eval-row-mandatory.ts --publish. It scans every row in evals.ndjson written post-cutoff; if a row is missing one of the ids, or the two are equal, the lint returns exit 1. This is a ship gate — --publish is wired into pre-ship CI.

5c. Honest scope of what's enforced

Enforced: both fields present, both non-empty, not equal. (Shape check.)
Not enforced: that the two ids correspond to the specific processes that actually did the generating vs. the grading. In principle a caller could pass score({actor_session_id: "a", auditor_session_id: "b", ...}) with two made-up strings and the lint would accept it. The sessionId() helper exists so honest callers write honest rows; the lint catches lazy callers (forgot to pass it, passed the same one twice). It does not catch adversarial callers.
Measured state (ledger audit 2026-04-20): 2979 total rows, 497 post-cutoff, 485 honored (distinct actor+auditor), 12 missing fields, 0 equality violations.

So: the enforcement is shape-level. The property we care about (different pids in practice) is maintained because the default path through sessionId() produces distinct ids for distinct subprocesses, and callers use the default.

6. The lint suite — what is actually checked

46 scripts in state/lint/. The ones that matter for the contract:

check.ts — top-level structural lint. Every skill has required frontmatter, a body, valid references. Also checks snappy-os invariants (one skill per file, no orphaned api.ts, etc.).
eval-row-mandatory.ts — the actor≠auditor + writer_id + chain-eval + touched-has-eval ship gate (§5).
prose-sidecar-drift.ts — catches a skill whose prose grew to ≥3 executable commands without a sidecar (state/bin/<slug>/* or state/lib/<slug>.ts). Writes a friction row + enqueues for regen.
sync-integrity.ts — verifies gateway round-trip (each file's sha+size matches state/log/sync-manifest.json).
parity-test.ts — refreshes state/log/parity.ndjson (the runtime-by-runtime measurement).
cron-coherence.ts / cron-drift.ts — Class A (scaffolding) crons banned; Class B (skill sidecar crons with actor≠auditor) allowed. See §9.
evals-dedup.ts / evals-integrity.ts — ledger health.
loaders-sync.ts — every skill with an .agents.md matches its .md in the regen-window.
lib-smoke-import.ts — every api.ts module is importable (no dead imports, no circular).
library-shape.ts — api.ts modules export what they claim.
gateway-health.ts / eval-endpoint-live.ts / e2e-receipts.ts — network-facing contracts.

What the lint suite is not: it is not provenance-level. It does not prove an eval row was written by the true grader process. It proves the row has the right shape.

7. Sync — two layers, both bidirectional

7a. git — code and schema

What lives here:

program.md (the schema)
bin/ (top-level CLI entry; runtime scripts like cli.js)
state/lib/*.ts (api.ts modules)
state/lint/*.ts (lints)
state/skills/*.md (the skill atoms themselves — the portable Anthropic-spec files)
state/hooks/*.sh (hook bodies)
state/regen/* (footer + drain/enqueue scripts)
state/recipes/*.md (recipe bundles — see §8)

Command: git push origin main from one machine; git pull --rebase origin main on the others. Every change to anything under state/ requires rebase-before-push (rule in CLAUDE.md).

7b. gateway + manifest — runtime artifacts

What lives here:

live copies of state/skills/* and state/skills/*.agents.md (served from skills.snappy.ai)
hook bodies (served for bootstrap — see CLAUDE.md "New machine setup" step 2)
eval aggregates (state/log/aggregates/*)

Command: node bin/cli.js push --auto (default scope=state). Worker writes through to DO Spaces. Each write records sha+size in state/log/sync-manifest.json. A per-batch probe verifies the write landed; on drift, a friction row is logged (root cause of the 4 silent-drop incidents on 2026-04-18).

Verification: npx tsx state/lint/sync-integrity.ts --gateway --manifest.

8. Recipes and engagement — opt-in bundles

state/recipes/*.md — 4 recipes currently:

ambient-sync.md — passive background sync of the gateway artifacts.
autopilot.md — the regen drain dispatches a headless Claude to fix queued issues on Stop.
nightly-digest.md — cron-driven digest of the day's frictions + eval trends.
pid-loop.md — the .agents.md feedback behavior (fix inline / log for drain).

A recipe is not a plugin. It's a named bundle of hook behaviors declared in markdown. To engage a recipe, its name goes into state/engaged.json:

{
  "recipes": ["ambient-sync", "pid-loop", "autopilot", "evolve"],
  "last_changed": "2026-04-18T22:25:00.000Z",
  "by": "pod-b-evolve-audit"
}

Hooks and cron workers check engaged.json before acting. Empty recipes array = quiet harness (safety valve). This is the user's explicit opt-in. Nothing runs unless it's named in this file.

9. Cron — two classes

From program.md §8:

Class A — scaffolding cron (BANNED). A cron that runs against the snappy-os tree itself, spawning agents to "clean up" or "maintain." Banned because it violates actor≠auditor at scale (the cron is the actor AND the auditor, writing to the same ledgers without an external grader).
Class B — skill sidecar cron (ALLOWED). A cron attached to one skill's api.ts, running a single operation (poll an inbox, refresh a token, emit a digest), with its own actor process and a separate grader invocation. Must honor the actor≠auditor shape.

state/lint/cron-coherence.ts and cron-drift.ts enforce the class split.

10. Environment — where credentials live

.env.cache at the repo root is the canonical file. snappy-os owns it. Loaded by state/lib/env.ts. Every api.ts that hits an external API reads credentials through env.ts.

~/.claude/skills/snappy-settings/.env.cache is a symlink pointing at the repo-root file (back-compat for the old path). If either is missing or broken, anything that hits an external API fails — fix the symlink direction first.

Setup from scratch is documented in CLAUDE.md under "New machine setup."

11. Seed-owned vs user-owned

program.md §9 lists which files the snappy-os project owns (ship with every clone) vs which files belong to the user's tenant (their specific skills, their eval history, their engaged recipes).

Seed-owned: program.md, bin/cli.js, state/lib/*.ts, state/lint/*.ts, state/hooks/*.sh, state/regen/*, state/skills/_template.md, the substrate skills (bootstrap, ops, eval, sync-*, etc.).
User-owned: state/engaged.json, state/log/*, the user-authored skills that aren't in the seed list, .env.cache.

The seed manifest is state/lib/seed-manifest.ts. A user pulling snappy-os updates gets new seed files; their own state/log/* and user skills are untouched.

12. Distribution shape — Option A / Option B (no "plugin" word)

snappy-os is a scaffolded tree that lives in the user's git. Not a plugin. Not a package you install once. The tree IS the system. Updates come via git pull.

Two distribution flavors the user can opt into, both building on the same Anthropic-spec skill atoms:

Option A — full tree

The user clones snappy-os, gets all 12 layers above. Keyword-gated loaders via hooks. The PID loop self-heals gaps. Evals ledger. Frictions ledger. Recipe engagement. Sync to gateway. Works across Claude Code, Codex, openclaw, Gemini CLI, Cursor, Windsurf (parity measured in parity.ndjson).

Option B — plain skills only

The user brings their own agent runtime (Claude Code, or any Anthropic-spec consumer) and wants nothing more than the portable state/skills/*.md files. They can copy out single skills and drop them into ~/.claude/skills/<name>/. No loader, no hooks, no ledger. Pure Anthropic-spec. Still useful — the skill atom stands alone.

The goal for skills.snappy.ai: help a user who comes in wanting AI to "have hands and eyes in X" (automate a process, scrape a site, render images, etc.). Help them assemble the skills they need. They choose whether to take the full tree (Option A) or just the atoms (Option B). We don't impose.

The hero tagline, confirmed 2026-04-22: treat markdown like code.

13. Index of every named file

Grouped by layer. Every word in this doc has a file.

The atom

Skill: state/skills/<name>.md
api.ts: state/lib/<name>.ts (headers itself as snappy-<name>/api.ts)
Scripts: state/bin/<name>/* or state/bin/<name>.{sh,ts}
Loader: state/skills/<name>.agents.md

The cross-cutting layer

PID footer: state/regen/footer.md
Drain: state/regen/drain.sh
Enqueue: state/regen/enqueue.sh
Entry hook: state/hooks/snappy-os-inject.sh
Stop hook: state/hooks/snappy-os-stop.sh
Autopilot hook: state/hooks/snappy-os-auto-regen.sh
Preload (user): state/hooks/preload-skill-context-user.sh
Preload (subagent): state/hooks/preload-skill-context.sh
Regen enqueue (PostToolUse): state/hooks/enqueue-skill-regen.sh
Regen drain (Stop): state/hooks/drain-skill-regen.sh
Session check: state/hooks/skill-check-session.sh

Ledgers

Evals: state/log/evals.ndjson
Frictions: state/log/frictions.ndjson
Diagnostics: state/log/diagnostics.ndjson
LOGGED stream: state/log/agents-md-feedback.log
Regen queue: state/log/regen-pending.txt
Sync manifest: state/log/sync-manifest.json
Parity: state/log/parity.ndjson

Eval contract

Session id + score + friction + pending: state/lib/eval.ts
Ship gate: state/lint/eval-row-mandatory.ts (run with --publish)
Cutoff constant: ACTOR_AUDITOR_REQUIRED_AFTER = "2026-04-20T18:00:00Z"
Quarantine: state/lib/eval-quarantine.json

Recipes and engagement

Recipes: state/recipes/*.md (ambient-sync, autopilot, nightly-digest, pid-loop)
Engagement: state/engaged.json

Env

Loader: state/lib/env.ts
File: .env.cache (repo root)

Schema and catalog

Schema: program.md
Catalog: state/index.md
Seed manifest: state/lib/seed-manifest.ts

Distribution

Full tree: Option A. Sync via git + node bin/cli.js push --auto.
Plain skills: Option B. Copy state/skills/<name>.md out.
Never: "plugin."

14. What is NOT in the system (so future doc writers don't invent it)

There is no "core package." No skill has category: core in frontmatter. Don't build a Core tile.
There is no "starter." trigger: is empty catalog-wide.
There is no "runner" layer. The TypeScript library module is api.ts. The shell scripts are state/bin/*. Use those names.
There is no "plugin." snappy-os is a scaffolded tree.
There is no self-scoring. An agent does not grade its own run. Grading happens in a different OS process (different pid, different sessionId).
The home page of skills.snappy.ai does not slice the catalog into derived tiles by frontmatter. When a future UI writer proposes a "Kernel / Ops / Integrations / Clients" row, they are re-inventing something that was tried 2026-04-22 and explicitly rejected.