CI for sync

What this layer does

Phase 15 protects the sync layer from regressions. Robert's mirror runs GitHub Actions on every commit. A 6-hour cron runs the read-only smoke suite against live canonical. Every Joe machine runs snappy-os doctor every 6 hours and POSTs a _alert on failure. Three feedback loops: PR-time, schedule-time, and field-time.

Files involved

internal mirror only; runs on PR open + push to main.

smoke read-only against live.

s3://robert-storage/snappy-os-ci/<sha>/ prefixes older than 7d.

latency probe.

cell; exit code = failure count.

GitHub Actions (Robert's mirror only)

sync-ci.yml runs on PR open and push to main:

s3://robert-storage/snappy-os-ci/<commit-sha>/ (never touches production)

Cleanup job removes snappy-os-ci/<sha>/ prefixes older than 7 days to keep the bucket cost flat.

DO Spaces creds via GitHub repo secret. The same key set as the Worker's secret rotation flow uses; rotation in Phase 10 covers CI.

Robert-machine cron

0 */6 * * * ~/projects/snappy-os/state/bin/sync/ci-loop.sh

ci-loop.sh runs the smoke against live canonical in read-only mode: pull --dry and Worker GET only, no writes. Failures land in state/log/alerts/sync-degraded-<ts>.md and surface in /snappy-ops "System / ops" → "Sync status".

Joe-machine cron

30 */6 * * * snappy-os doctor --silent || snappy-os alert "doctor-failed"

Installed by bootstrap. Runs the local section-A lints + parity-matrix cells. On non-zero exit, POSTs _alert to the Worker so failures across the tenant base aggregate in /_status.

Pre-launch gates

These run once before launch, not on schedule:

concurrent _push (10 KB, distinct tenants). Asserts <2s p95 install, <5s p95 push, 0 5xx. Block launch on fail.

Cloudflare regions. p95 <500ms; p99 <1500ms. Block launch on fail.

No green-on-three = no launch.

Operational gotchas

that write to production bucket are rejected at the lint step.

feedback loops (CI write → catalog rebuild → CI runs again).

filling logs with green rows. Failures get the explicit alert POST.

unbounded and DO Spaces cost climbs.

tests means the launch waits for the underlying fix; do not soften thresholds to ship.

How to verify it's working

every lint + smoke step green.

/_status.

cleanup run.