drop another .md file to compare - side-by-side diff against snappy-autopilot-twocron

snappy-autopilot-twocron

Splits finding problems from fixing them so the two never grade themselves.

description: "Triggers on prompt mention of 'autopilot' or 'snappy-autopilot-twocron'."

personal 2 files 10 recent evals

Export

What it does for you

Splits finding problems from fixing them so the two never grade themselves.

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me

For developers how this skill is built, graded, and how it runs

at a glance- the short version

eval modeauto-shape

categoryOps

stages1

dependssweep, commit-report

what's inside - the parts that make up a skill 2/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill

state/skills/snappy-autopilot-twocron.md present

the skill itself, in plain text

The main file. It says what the skill is and lays out the steps in plain English.

Code

state/lib/snappy-autopilot-twocron.ts not present

code the skill can run

Optional. Many skills are just words and need no code at all.

Scripts

state/bin/snappy-autopilot-twocron/ not present

helper scripts

Optional. Added when a skill has a few commands to run.

Loader

state/skills/snappy-autopilot-twocron.agents.md present

what the AI loads on the fly

Loaded automatically the moment this skill is needed. Kept short on purpose.

how it runs - the shared frame every skill uses 3/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker

not present

No work step here. This is probably a skill that reads or coordinates, not one that produces something.

checks the work The reviewer

inferred

shape gate an automatic check

The check is an automatic pass or fail on the shape of the result, run separately from the work itself.

frame

learns Self-correction

present

fixes itself learns from gaps

When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.

tidies up Background fixes

present

queued for rewrite runs in the background

Bigger fixes that can't be made on the spot get queued and rewritten in the background later.

remembers Run history

present

state/log/evals.ndjson auto runs

Every run is written down here, so the next time this skill is used it already knows how the last runs went.

Critical rules the things this skill must not get wrong

Breaker NEVER fixes; fixer NEVER adds new p0/p1 rows — actor ≠ auditor at the autopilot loop level (lint enforces breaker-only writes for p0/p1)
The shared report (state/log/breakage-report.ndjson) IS the queue — append-only from breaker; in-place row rewrite (status open → resolved) from fixer
Breaker MUST append at least one row per invocation — either a finding or a clean row ({sev:"clean", status:"none-found"}); silent runs score 0
Both scripts still check state/engaged.json for the autopilot recipe; if absent (current default), they exit 0 silently — to re-engage for a manual session, add "autopilot" back to the recipes array
Per-invocation budget cap $1 / timeout 5min still applies for safety when run by hand
Fixer rewrites resolved row in place: {...original fields..., status:"resolved", resolved_ts, resolution:"<one-sentence>"}

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

Loading feedback rows…

how the work flows- step by step

inputs sweepcommit-report

1 data

eval log

`state/log/evals.ndjson` (skill: "snappy-autopilot-twocron", verb: "break"|"fix")

+ eval for this step

SKILL.md- the skill, written out in plain English

snappy-autopilot-twocron

Two cron entries, one shared report file, clean split between finding breakage and fixing it. Extends program.md rule 5 (actor ≠ auditor) to the autopilot loop itself.

The model

                 state/log/breakage-report.ndjson
                       ▲                 ▲
                       │ appends         │ reads + resolves
  ┌────────────────────┘                 └───────────────────┐
  │                                                           │
┌─┴──────────────┐                                   ┌────────┴───────┐
│ break.sh       │  every 30m (:00, :30)             │ fix.sh         │
│                │                                   │                │
│ acts as a USER │                                   │ acts as a      │
│ of snappy-os;  │                                   │ developer; one │
│ tries to break │                                   │ open row per   │
│ something;     │                                   │ tick; writes   │
│ appends ONE    │  offset 15m (:15, :45)            │ fix, commits,  │
│ finding row.   │                                   │ pushes, marks  │
│                │                                   │ row resolved.  │
└────────────────┘                                   └────────────────┘

The report is the queue. Breaker never fixes. Fixer never adds new rows.

Files

state/bin/autopilot/break.sh - breaker tick. Spawns headless Claude with

a "you are a user of snappy-os, try to break it" prompt. Budget $1, timeout 5m, writes one row to the report per tick.

state/bin/autopilot/fix.sh - fixer tick. Same spawn shape, prompt is

"read the report, pick one open row, fix it, commit, mark resolved."

state/log/breakage-report.ndjson - the shared queue. Append-only from

breaker; in-place rewrite per-row from fixer (status open → resolved).

Report schema

Breaker appends:

{"ts":"<iso>","sev":"p0|p1|p2","area":"<tag>","surface":"<tried>","expected":"<docs>","actual":"<reality>","repro":"<cmd>","status":"open"}

Or, if nothing found on the surface tried:

{"ts":"<iso>","sev":"clean","area":"<tag>","surface":"<tried>","status":"none-found"}

Fixer rewrites row in place:

{...original fields..., "status":"resolved","resolved_ts":"<iso>","resolution":"<one-sentence>"}

Cron

0,30 * * * * state/bin/autopilot/break.sh  >> logs/snappy-os-breaker.log 2>&1
15,45 * * * * state/bin/autopilot/fix.sh   >> logs/snappy-os-fixer.log 2>&1

30-min cadence for each = 48 ticks/day × 2 = 96 cron fires. Each bounded to $1. Worst-case daily spend: $96; realistic spend with engagement gate + clean-row idle path: $10-25/day.

Engagement gate

Both scripts check state/engaged.json for the autopilot recipe. If disengaged, exit 0 silently. Disengage by removing "autopilot" from the recipes array in state/engaged.json (no CLI - the file is the source of truth).

Why two crons, not one

Prior design was a single autopilot --tick that tried to do everything. Two problems:

Conflated detection and remediation → fixer had to redo detection work

and the subagent lost focus trying to hold both roles.

No clean signal for "did we actually close a gap" - because the same

agent that found the problem was grading its own fix.

Splitting gives:

Adversarial pressure on the system from the breaker (it's incentivized to

find cracks, not forgive them).

Focused fixes from the fixer (one row = one PR-sized change).
PID signal from the delta between open-vs-resolved row ratios over time.

Eval

Shape-gate:

break.sh must append at least one row per successful tick (either a

finding or a clean row - never silent).

fix.sh must never append new p0/p1 rows; lint enforces breaker-only

writes for those severities.

Every tick logs an eval row to state/log/evals.ndjson with

skill: "snappy-autopilot-twocron", verb: "break"|"fix", score = 1 if exit 0 else 0.

Graduation

When break.sh consistently finds high-signal breakages (not noise) and fix.sh closes them with low revert rate (<10% regressions per week), the skill graduates to having its own PID-tuned prompts - the system rewrites its own breaker/fixer prompts based on the eval log.

AGENTS.md- what the AI loads when this skill comes up

snappy-autopilot-twocron - loader

Per-turn rules for the snappy-autopilot-twocron skill. Full reference: state/skills/snappy-autopilot-twocron.md. Do not skip these.

STATUS (2026-04-23): cron is OFF

The breaker + fixer cron lines were removed from crontab -l on 2026-04-19 (REMOVED 2026-04-19: PID loop carries load via evals; /snappy-fix is on-demand path). "autopilot" was also dropped from state/engaged.json on 2026-04-23 during a cron-drift audit. The skill is not running autonomously. Both scripts below still work when invoked by hand; they are the on-demand path, not a continuous loop.

Critical Rules

Breaker NEVER fixes; fixer NEVER adds new p0/p1 rows - actor ≠ auditor at the autopilot loop level (lint enforces breaker-only writes for p0/p1)
The shared report (state/log/breakage-report.ndjson) IS the queue - append-only from breaker; in-place row rewrite (status open → resolved) from fixer
Breaker MUST append at least one row per invocation - either a finding or a clean row ({sev:"clean", status:"none-found"}); silent runs score 0
Both scripts still check state/engaged.json for the autopilot recipe; if absent (current default), they exit 0 silently - to re-engage for a manual session, add "autopilot" back to the recipes array
Per-invocation budget cap $1 / timeout 5min still applies for safety when run by hand
Fixer rewrites resolved row in place: {...original fields..., status:"resolved", resolved_ts, resolution:"<one-sentence>"}

Commands

|breaker: state/bin/autopilot/break.sh - on-demand only (cron REMOVED 2026-04-19) → logs/snappy-os-breaker.log |fixer: state/bin/autopilot/fix.sh - on-demand only (cron REMOVED 2026-04-19) → logs/snappy-os-fixer.log |queue: state/log/breakage-report.ndjson |engagement gate: state/engaged.json - autopilot NOT in recipes as of 2026-04-23; add it back before invoking if you want the engagement-gate path |eval log: state/log/evals.ndjson (skill: "snappy-autopilot-twocron", verb: "break"|"fix") |replacement: PID loop via evals + /snappy-fix for on-demand remediation

Known Pitfalls

A single autopilot --tick script that does both detection AND remediation conflates the roles; the fixer ends up redoing detection work and grading its own fix - kept the two scripts split for a reason, even now that they're on-demand
Silent runs (no row appended) make the PID delta unmeasurable - always emit either a finding or a clean row
Do NOT restore the cron lines without operator sign-off. The removal on 2026-04-19 was deliberate; PID-via-evals is the current load-bearing replacement.

Self-Test

An agent reading this should correctly:

[ ] Refuse to let the fixer append new p0/p1 findings, even when it notices something while fixing?
[ ] Always emit a row (finding OR clean) per breaker tick, never silent?
[ ] Check state/engaged.json and exit 0 when disengaged, before spending any model budget?
[ ] Rewrite the original row in place when resolving (not append a separate "resolution" row that loses the original context)?

Self-report

If this loader fell short, append a line:

echo "[$(date -u +%FT%TZ)] snappy-autopilot-twocron: <what was missing>" >> ~/.claude/logs/snappy-os-loader-feedback.log

api.ts- the code it can call

⚠ no api.ts - this skill has no typed action surface

scripts- helper scripts it can run

prose-only skill - 5 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 10 runs

rubric auto-shape no rubric declared

recent mean 1.00 · 10 runs actor/auditor: unverifiable

deps sweep commit-report

timestamp	verb	score	primary_issue	artifact
2026-04-25 04:11Z	-	1.00	-	-
2026-04-21 15:59Z	-	1.00	-	-
2026-04-21 15:57Z	-	1.00	-	-
2026-04-21 03:53Z	-	1.00	-	-
2026-04-25 04:11Z	-	1.00	-	-
2026-04-21 15:59Z	-	1.00	-	-
2026-04-21 15:57Z	-	1.00	-	-
2026-04-21 03:53Z	-	1.00	-	-
2026-04-25 04:11Z	-	1.00	-	-
2026-04-21 15:59Z	-	1.00	-	-