User-install audit — 2026-04-18

Trace of what a fresh user gets when they run npx snappy-os, vs what Robert (the operator of the control plane) runs locally. Evidence on every claim.


1. Operator plane vs user plane

The distinction IS modeled — tenant IDs, tier grants, invite codes, separate DO Spaces prefixes all exist. But on disk today, only Robert's tenant is active.

ConcernOperator plane (Robert)User plane (Joe)
Tenant IDsha256(SNAPPY_MASTER_KEY).first(12) — one-of-one todaySame derivation; would get its own 12-char prefix
Master key location.env.cache at ~/projects/snappy-os/.env.cache (13k, full secret bundle)Single SNAPPY_MASTER_KEY env var from invite URL; no .env.cache required
DO Spaces prefixWrites to snappy-os/ public canonical + snappy-os-tenants/<robert_id>/Reads public canonical; writes ONLY to snappy-os-tenants/<joe_id>/
Worker roleMints invites via POST /invite/mint with Bearer MASTER_KEYNever authenticates as admin; redeems via GET /invite/<code>
Kernel library (~/projects/snappy-kernel)Was the frozen reference library; not present on THIS machineNever shipped to Joe machines per bin/install.js:326 comment
Skills authoringEdits state/skills/<name>.md, pushes via cli.js pushCan edit locally; push is tenant-scoped; cross-tenant PID requires quorum
Worker deployWrites ~/projects/snappy-skills/ + wrangler deployNever touches Worker code
Auth tiertier: personal with is_master: true (auth.ts:63-65)tier: public by default, tier: personal only if their key matches a KV API_KEYS record

Source evidence: ~/projects/snappy-skills/src/auth.ts:62-65, pull.ts:126-145, state/wiki/multi-tenant.md:34-41, state/bin/sync/tenant-id.sh:4-8.


2. Install timeline — what actually runs on npx snappy-os

Entry point: bin/install.js (Joe never clones from GitHub — canonicals come from the Worker via bin/cli.js pull, comment at install.js:5).

#StepMechanismFailure modeEst. time
0Default mode is dry-runSNAPPY_BOOTSTRAP_APPLY=1 required to mutateUser thinks nothing happened<1s
1Detect platformos.platform() + /proc/version WSL checkUnsupported OS → warn, continue<1ms
2Detect runtimes + version-gatewhich(claude/codex/gemini/openclaw/cursor/windsurf) + --versionrefuse-too-old aborts; warn-degraded continues~1s (6 which calls)
3Resolve SNAPPY_MASTER_KEYPriority: env var → $BOOTSTRAP_INVITE_CODEop readdoppler secrets get → interactive promptDry-run skips prompt<100ms
4Pull snappy-os canonicalnode bin/cli.js pull --force --scope all --repo os (presigned DO URLs from Worker)Requires bin/cli.js to already exist — bootstrap-chicken-egg handled by npm tarball shipping cli.jsvaries (manifest-diff)
5symlink-runtimes.shFan-out per-skill symlinks into ~/.claude/skills/, ~/.codex/, etc.Backs up existing dirs to ~/.claude/_backups/<ts>/ before overwriting2-5s
6wire-hooks.jsIdempotent ~/.claude/settings.json + ~/.codex/config.toml edit; dedupes by command stringBare {command} rejects whole settings.json silently (user memory note)<1s
7sync-runtimes.ts --codex-expand3-file expansion for Codex<1s
8sync-runtimes.tsCLAUDE.md → AGENTS.md / GEMINI.md / .cursorrules / .windsurfrules / .github/copilot-instructions.md<1s
9Cron installIdempotent crontab - append: `30 /6 ... cli.js doctor --silent... alert doctor-failed`Windows skipped; crontab absent → fail<1s
10Smoke testsprogram.md exists, claude mirror resolves, skills.snappy.ai/_status 200, per-runtime snappy-* visibilityEach reported independently~5s (HTTP roundtrip)
11Print next stepsProse<1ms

Verified on THIS machine (Robert's) via .bootstrap-report.json: mode dry-run, all 11 steps ok, all smoke checks PASS, 222 snappy-* skills visible under ~/.claude/skills/, Worker HTTP 200.

Live gateway probe: GET https://skills.snappy.ai/.well-known/skills/snappy-os/hooks/snappy-os-inject.sh returns HTTP 200, 5119 bytes — fresh-machine path verified accessible.

The CLAUDE.md 6-step manual block (from ~/.claude/CLAUDE.md "New machine setup") is LEGACY prose. It tells users to clone from github.com, symlink .env.cache, and curl-install the hook by hand. The live code path (bin/install.js) does NONE of that — it uses npx snappy-os + the Worker-backed cli.js pull. The manual steps are a diverged fallback.

Broken: symlink ~/.claude/skills/snappy-settings/.env.cache referenced in both the user CLAUDE.md and the settings skill loader does not exist on this machine. Only SKILL.md is symlinked. The settings skill loader claims "fix the symlink first when anything that hits the network mysteriously fails" — but the symlink that would need fixing isn't there. Scripts that read .env.cache via the kernel fallback path would silently 404.


3. Infra dependencies touched on install

ServiceCalled fromPurposeRequired for Joe?
Cloudflare Worker (skills.snappy.ai)cli.js pull/push/alert/_status, hook snappy-os-inject.shAuthenticated ingress to DO Spaces; KV-cached (60s) skill serve; tenant derivation; invite redemptionYES — hard dependency
DO Spaces (robert-storage)Via Worker only (Joe never holds DO creds per install.js:5 + worker-architecture.md:14)Canonical storage: snappy-os/, snappy-kernel/, snappy-os-tenants/<id>/, snappy-os-meta/tenants/<id>.jsonINDIRECT — Worker gates it
npm registrynpx snappy-os first invocationTarball distribution (replaced old `curlbash per install.ts:2-20` Robert 2026-04-17 directive)YES — bootstrap entry
GitHubbin/install.js:5 comment explicitly says "no git clone, no github.com"NOT used on fresh installNO
Anthropic / OpenAIRuntime skill invocations (NOT install-time)Skill executionNO for install itself
1Password / DopplerOptional key resolvers in resolveMasterKey() (install.js:171-178)Secret retrieval convenienceNO (falls back to prompt)

Worker routes Joe hits (~/projects/snappy-skills/src/index.ts + route table in worker-architecture.md:39-54): GET /_pull, POST /_alert, GET /_status, GET /invite/<code> (HTML landing page, no longer bash). POST /_push only after first action.


4. Tenant model — honest answer: modeled, single-tenant in practice

What exists in code:

snappy-os-clients/<slug>/, snappy-os-subscriber/.

What's present on disk / in live state today:

invite-walkthrough.md:44 names it.

means auth.ts:69-71 returns tier: public, name: invalid-key.

Verdict: the tenancy infrastructure is in place and authorization is server-side enforced, but no second tenant has been onboarded. To onboard one, Robert must:

  1. Generate a fresh SNAPPY_MASTER_KEY for Joe (no tooling found for this —

Joe supplies his own, per invite-mint.ts:86-88 which validates length ≥16).

  1. Run invite-mint.ts --tenant=<joes-key> --label="..." against the Worker.
  2. Send Joe the invite_url.
  3. Add Joe's tenant to s3://robert-storage/snappy-os-meta/tenants/<id>.json

if non-public tier access is needed.

  1. Joe pastes the npx line into his terminal.

Step 4 has no helper script — it's a manual DO Spaces write today.


5. Expansion story — how a user adds their own skill

There is a prose skill skill (state/skills/skill.md) with a scaffolder at state/lib/skill.ts. No /snappy-new-skill slash command.

The documented flow:

  1. Scaffold: Joe creates state/skills/<name>.md with required

frontmatter (name, description, eval). Start with graduation: prose, eval: manual.

  1. Add sidecar loader: state/skills/<name>.agents.md with Critical

Rules, Commands, Pitfalls, Self-Test per the program.md sidecar spec.

  1. Update catalog: Manually append to state/index.md (not auto-generated

state/skills/skill.md:50-54 explicitly flags this as recurring drift).

  1. Run lint: npx tsx state/lint/check.ts validates shape — no ## Eval

section = fail; sidecar present = pass.

  1. Optionally graduate: write actor (state/lib/<name>.ts or

state/bin/<name>/) + independent auditor. Flip eval: manual to eval: auto only after actor/auditor split exists.

  1. Push: node bin/cli.js push --auto writes to DO Spaces under tenant

prefix. Public canonical requires quorum promotion (≥3 tenants, ≥0.85 score, ≥5 runs per quorum.ts).

  1. Cross-tenant promotion: Automatic via Worker scheduled handler every

minute (worker-architecture.md:57-62). No manual review meeting.

Gap: Joe's tenant skill lives at snappy-os-tenants/<joe_id>/state/skills/<name>.md but the pull gate (pull.ts:128) allows ALL tenants to read state/skills/ in the public repo. Tenant-private skills need a separate prefix — today any skill Joe writes either stays local or goes into public canonical via quorum. No middle ground ("my team's private skills") is wired.


Honest gaps (3 concrete)

  1. Symlink ~/.claude/skills/snappy-settings/.env.cache does not exist

on this machine. Both ~/.claude/CLAUDE.md setup block AND the settings skill loader assume it does. Legacy kernel-path fallback silently 404s.

  1. No tooling to onboard a second tenant end-to-end. invite-mint.ts

mints but the tenant grants file (snappy-os-meta/tenants/<id>.json) has no helper — operator edits DO Spaces by hand.

  1. No private-tenant skill tier. Code distinguishes public canonical from

tenant-private state, but skill content is treated as public (pull.ts:128 allows all tenants to read state/skills/). A user who writes a skill either pushes it to quorum promotion or keeps it local. "Personal team skills" is unmodeled.

Verdict

Operator vs user plane is formally modeled, practically single-tenant. The npm package + Worker ingress architecture is real and works (.bootstrap-report.json shows 11/11 green on this machine, gateway HTTP 200). A second user could onboard via invite-mint.ts + npx snappy-os today, but would land in a tenant-of-one shape with no team-private skill space and with the legacy .env.cache symlink advice in CLAUDE.md that doesn't match the install.js flow. The multi-tenant spec is further along than the multi-tenant usage — the substrate is 80% built, the flight path has been walked once (Robert), and the documented onboarding loop would require at least one manual DO Spaces write per new tier grant.