Secrets rotation

What this layer does

Phase 10 keeps every secret in the substrate rotatable without downtime. Three secret families exist: DO Spaces credentials (Worker only), per-tenant SNAPPY_MASTER_KEY, and the Wrangler API token (Robert's CI only). Each rotates independently with a 24h grace window. Joe machines never hold DO creds, so DO rotation never touches a Joe machine.

Files involved

(in SYNC_DENY; per-machine).

first, falls back to DO_SPACES_KEY for 24h.

Joe machines.

A. DO Spaces credentials (Worker secret only)

# Generate new pair in DO panel
wrangler secret put DO_SPACES_KEY_NEW
wrangler secret put DO_SPACES_SECRET_NEW
# Worker reads NEW first, OLD as 24h fallback
state/bin/sync/rotate-do-creds.sh   # smoke push with new creds
# On success:
wrangler secret delete DO_SPACES_KEY
wrangler secret delete DO_SPACES_SECRET
# Then rename NEW → primary via wrangler secret put

Joe machines never affected. They never held DO creds (per the Worker-only-ingress rule).

B. SNAPPY_MASTER_KEY (per-tenant)

# Tenant generates new key locally
NEW_KEY=$(openssl rand -hex 32)
curl -X POST https://skills.snappy.ai/_rotate \
  -H "Authorization: Bearer $OLD_KEY" \
  -d "{\"new_key\":\"$NEW_KEY\"}"
# Worker writes new tenant grants to KV API_KEYS
# Old key valid for 24h grace window
# Tenant updates .env.cache:113 locally
# Bootstrap re-prompts on next init

C. Wrangler API token

Used only by Robert's CI for wrangler deploy. Rotation: regenerate in Cloudflare dashboard, update GitHub repo secret + 1Password vault. Never distributed to Joe machines.

Audit trail

state/log/secrets-rotation.ndjson row shape:

{"ts":"2026-04-16T19:30:00Z","kind":"do-spaces","action":"rotate","machine":"mbpro-rb","success":true}

Append-only. Per-machine (in SYNC_DENY) — auditing rotation events across tenants would leak per-tenant rotation cadence.

Operational gotchas

breaks Joes who haven't pulled the new tenant grant yet (per-tenant key change) or in-flight Worker requests (DO key change).

key cannot self-rotate; recovery is via /install/<inviteCode> with a Robert-minted single-use code.

deleting the old pair. Delete-then-fail is unrecoverable without Cloudflare audit log.

independently; never share a single key across environments.

failure surfaces in /_status.

How to verify it's working

the new key over a 24h window.

both old and new keys for 24h, fails with old key after grace.

true` continuously through a rotation window.