Secrets rotation
What this layer does
Phase 10 keeps every secret in the substrate rotatable without downtime. Three secret families exist: DO Spaces credentials (Worker only), per-tenant SNAPPY_MASTER_KEY, and the Wrangler API token (Robert's CI only). Each rotates independently with a 24h grace window. Joe machines never hold DO creds, so DO rotation never touches a Joe machine.
Files involved
state/bin/sync/rotate-do-creds.sh— Worker-secret rotation flow.state/log/secrets-rotation.ndjson— append-only audit trail
(in SYNC_DENY; per-machine).
~/projects/snappy-skills/src/auth.ts— readsDO_SPACES_KEY_NEW
first, falls back to DO_SPACES_KEY for 24h.
wrangler.json— secret bindings for new + old key pairs..env.cache:113— per-tenantSNAPPY_MASTER_KEYstorage on
Joe machines.
A. DO Spaces credentials (Worker secret only)
# Generate new pair in DO panel
wrangler secret put DO_SPACES_KEY_NEW
wrangler secret put DO_SPACES_SECRET_NEW
# Worker reads NEW first, OLD as 24h fallback
state/bin/sync/rotate-do-creds.sh # smoke push with new creds
# On success:
wrangler secret delete DO_SPACES_KEY
wrangler secret delete DO_SPACES_SECRET
# Then rename NEW → primary via wrangler secret put
Joe machines never affected. They never held DO creds (per the Worker-only-ingress rule).
B. SNAPPY_MASTER_KEY (per-tenant)
# Tenant generates new key locally
NEW_KEY=$(openssl rand -hex 32)
curl -X POST https://skills.snappy.ai/_rotate \
-H "Authorization: Bearer $OLD_KEY" \
-d "{\"new_key\":\"$NEW_KEY\"}"
# Worker writes new tenant grants to KV API_KEYS
# Old key valid for 24h grace window
# Tenant updates .env.cache:113 locally
# Bootstrap re-prompts on next init
C. Wrangler API token
Used only by Robert's CI for wrangler deploy. Rotation: regenerate in Cloudflare dashboard, update GitHub repo secret + 1Password vault. Never distributed to Joe machines.
Audit trail
state/log/secrets-rotation.ndjson row shape:
{"ts":"2026-04-16T19:30:00Z","kind":"do-spaces","action":"rotate","machine":"mbpro-rb","success":true}
Append-only. Per-machine (in SYNC_DENY) — auditing rotation events across tenants would leak per-tenant rotation cadence.
Operational gotchas
- 24h grace is mandatory. Pulling old creds before grace expires
breaks Joes who haven't pulled the new tenant grant yet (per-tenant key change) or in-flight Worker requests (DO key change).
_rotaterequires the OLD key for auth. A tenant who lost their
key cannot self-rotate; recovery is via /install/<inviteCode> with a Robert-minted single-use code.
- The rotation script SHOULD smoke-push with the new creds before
deleting the old pair. Delete-then-fail is unrecoverable without Cloudflare audit log.
- Wrangler secrets are per-environment. Staging and production rotate
independently; never share a single key across environments.
- Audit row
success: falseMUST trigger a/_alertPOST so the
failure surfaces in /_status.
How to verify it's working
- After DO rotation,
wrangler tailshows requests succeeding with
the new key over a 24h window.
state/log/secrets-rotation.ndjsongains one row per rotation.- A tenant that runs
_rotatethensnappy-os pushsucceeds with
both old and new keys for 24h, fails with old key after grace.
curl https://skills.snappy.ai/_statusshows `do_spaces_reachable:
true` continuously through a rotation window.