Disaster recovery

What this layer does

Phase 9 makes the substrate recoverable from any single failure mode. DO Spaces bucket versioning protects against accidental delete or overwrite. Manual snapshots provide named checkpoints. Restore moves versioned bytes to a holding dir without touching live. Rollback is the explicit, label-confirmed full reversion. A weekly cross-region backup gives a cold copy outside the primary bucket's failure domain.

Files involved

s3://robert-storage/snappy-os-snapshots/<ts>-<label>/.

skill at an ISO timestamp into a holding dir.

refuses without --apply and label confirmation.

backup, baked into v1.

(in SYNC_ALLOW).

Bucket versioning

Enabled one-time on robert-storage via DO panel or s3cmd. Retention 30 days. Every overwrite preserves the prior version addressable by version-id; every delete is a tombstone with the prior version still retrievable.

Procedures

# Manual checkpoint before a risky change
state/bin/sync/snapshot.sh "pre-quorum-rewrite-2026-04-16"

# Restore a single skill to a point in time (non-destructive)
state/bin/sync/restore.sh snappy-image 2026-04-15T18:00:00Z
# → writes to ~/projects/snappy-os/_restore/snappy-image-<ts>/

# Full rollback (requires explicit apply + matching label)
state/bin/sync/rollback.sh 2026-04-16T12:00:00Z-pre-quorum-rewrite \
  --apply --confirm-label="pre-quorum-rewrite-2026-04-16"

Worker DR

wrangler deploy from Robert's machine.

to nyc3 gives a cold copy in a separate failure domain.

Operational gotchas

and prints the diff. Manual review + manual mv is mandatory. This prevents the restore tool from becoming a foot-gun.

matches the snapshot label exactly. This is intentional friction.

versions — they are independent prefixes that survive the version retention window.

If it stalls, state/lint/sync-freshness.ts flags the cross-region-backup row missing in state/log/snapshots.ndjson.

must not auto-push. Holding bytes leak otherwise.

How to verify it's working

state/log/snapshots.ndjson and the new prefix is visible at s3://robert-storage/snappy-os-snapshots/.

~/projects/snappy-os/_restore/<skill>-<ts>/ and exits 0 without touching live state.

with the refusal message.

7 days of bootstrap.

retention.