CI for sync

What this layer does

Phase 15 protects the sync layer from regressions. Robert's mirror runs GitHub Actions on every commit. A 6-hour cron runs the read-only smoke suite against live canonical. Every Joe machine runs snappy-os doctor every 6 hours and POSTs a _alert on failure. Three feedback loops: PR-time, schedule-time, and field-time.

Files involved

internal mirror only; runs on PR open + push to main.

smoke read-only against live.

s3://robert-storage/snappy-os-ci/<sha>/ prefixes older than 7d.

latency probe.

cell; exit code = failure count.

GitHub Actions (Robert's mirror only)

sync-ci.yml runs on PR open and push to main:

s3://robert-storage/snappy-os-ci/<commit-sha>/ (never touches production)

Cleanup job removes snappy-os-ci/<sha>/ prefixes older than 7 days to keep the bucket cost flat.

DO Spaces creds via GitHub repo secret. The same key set as the Worker's secret rotation flow uses; rotation in Phase 10 covers CI.

Robert-machine cron

0 */6 * * * ~/projects/snappy-os/state/bin/sync/ci-loop.sh

ci-loop.sh runs the smoke against live canonical in read-only mode: pull --dry and Worker GET only, no writes. Failures land in state/log/alerts/sync-degraded-<ts>.md and surface in /snappy-ops "System / ops" โ†’ "Sync status".

Joe-machine cron

30 */6 * * * snappy-os doctor --silent || snappy-os alert "doctor-failed"

Installed by bootstrap. Runs the local section-A lints + parity-matrix cells. On non-zero exit, POSTs _alert to the Worker so failures across the tenant base aggregate in /_status.

Pre-launch gates

These run once before launch, not on schedule:

concurrent _push (10 KB, distinct tenants). Asserts <2s p95 install, <5s p95 push, 0 5xx. Block launch on fail.

Cloudflare regions. p95 <500ms; p99 <1500ms. Block launch on fail.

No green-on-three = no launch.

Operational gotchas

that write to production bucket are rejected at the lint step.

feedback loops (CI write โ†’ catalog rebuild โ†’ CI runs again).

filling logs with green rows. Failures get the explicit alert POST.

unbounded and DO Spaces cost climbs.

tests means the launch waits for the underlying fix; do not soften thresholds to ship.

How to verify it's working

every lint + smoke step green.

/_status.

cleanup run.