OR Key
drop another .md file to compare - side-by-side diff against snappy-autopilot-twocron

snappy-autopilot-twocron

Splits finding problems from fixing them so the two never grade themselves.
description: "Triggers on prompt mention of 'autopilot' or 'snappy-autopilot-twocron'."
personal 2 files 10 recent evals

What it does for you

Splits finding problems from fixing them so the two never grade themselves.

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me
For developers how this skill is built, graded, and how it runs

at a glance- the short version

eval modeauto-shape
categoryOps
stages1
dependssweep, commit-report

what's inside - the parts that make up a skill 2/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill
state/skills/snappy-autopilot-twocron.md present
the skill itself, in plain text
The main file. It says what the skill is and lays out the steps in plain English.
Code
state/lib/snappy-autopilot-twocron.ts not present
code the skill can run
Optional. Many skills are just words and need no code at all.
Scripts
state/bin/snappy-autopilot-twocron/ not present
helper scripts
Optional. Added when a skill has a few commands to run.
Loader
state/skills/snappy-autopilot-twocron.agents.md present
what the AI loads on the fly
Loaded automatically the moment this skill is needed. Kept short on purpose.

how it runs - the shared frame every skill uses 3/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker
not present

No work step here. This is probably a skill that reads or coordinates, not one that produces something.

checks the work The reviewer
inferred
shape gate an automatic check
The check is an automatic pass or fail on the shape of the result, run separately from the work itself.
frame
learns Self-correction
present
fixes itself learns from gaps
When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.
tidies up Background fixes
present
queued for rewrite runs in the background
Bigger fixes that can't be made on the spot get queued and rewritten in the background later.
remembers Run history
present
state/log/evals.ndjson auto runs
Every run is written down here, so the next time this skill is used it already knows how the last runs went.
Critical rules the things this skill must not get wrong
  1. Breaker NEVER fixes; fixer NEVER adds new p0/p1 rows — actor ≠ auditor at the autopilot loop level (lint enforces breaker-only writes for p0/p1)
  2. The shared report (state/log/breakage-report.ndjson) IS the queue — append-only from breaker; in-place row rewrite (status open → resolved) from fixer
  3. Breaker MUST append at least one row per invocation — either a finding or a clean row ({sev:"clean", status:"none-found"}); silent runs score 0
  4. Both scripts still check state/engaged.json for the autopilot recipe; if absent (current default), they exit 0 silently — to re-engage for a manual session, add "autopilot" back to the recipes array
  5. Per-invocation budget cap $1 / timeout 5min still applies for safety when run by hand
  6. Fixer rewrites resolved row in place: {...original fields..., status:"resolved", resolved_ts, resolution:"<one-sentence>"}

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

  1. Loading feedback rows…

how the work flows- step by step

inputs sweepcommit-report
1 data
eval log
`state/log/evals.ndjson` (skill: "snappy-autopilot-twocron", verb: "break"|"fix")

SKILL.md- the skill, written out in plain English

snappy-autopilot-twocron

Two cron entries, one shared report file, clean split between finding breakage and fixing it. Extends program.md rule 5 (actor ≠ auditor) to the autopilot loop itself.

The model

                 state/log/breakage-report.ndjson
                       ▲                 ▲
                       │ appends         │ reads + resolves
  ┌────────────────────┘                 └───────────────────┐
  │                                                           │
┌─┴──────────────┐                                   ┌────────┴───────┐
│ break.sh       │  every 30m (:00, :30)             │ fix.sh         │
│                │                                   │                │
│ acts as a USER │                                   │ acts as a      │
│ of snappy-os;  │                                   │ developer; one │
│ tries to break │                                   │ open row per   │
│ something;     │                                   │ tick; writes   │
│ appends ONE    │  offset 15m (:15, :45)            │ fix, commits,  │
│ finding row.   │                                   │ pushes, marks  │
│                │                                   │ row resolved.  │
└────────────────┘                                   └────────────────┘

The report is the queue. Breaker never fixes. Fixer never adds new rows.

Files

  • state/bin/autopilot/break.sh - breaker tick. Spawns headless Claude with

a "you are a user of snappy-os, try to break it" prompt. Budget $1, timeout 5m, writes one row to the report per tick.

  • state/bin/autopilot/fix.sh - fixer tick. Same spawn shape, prompt is

"read the report, pick one open row, fix it, commit, mark resolved."

  • state/log/breakage-report.ndjson - the shared queue. Append-only from

breaker; in-place rewrite per-row from fixer (status open → resolved).

Report schema

Breaker appends:

{"ts":"<iso>","sev":"p0|p1|p2","area":"<tag>","surface":"<tried>","expected":"<docs>","actual":"<reality>","repro":"<cmd>","status":"open"}

Or, if nothing found on the surface tried:

{"ts":"<iso>","sev":"clean","area":"<tag>","surface":"<tried>","status":"none-found"}

Fixer rewrites row in place:

{...original fields..., "status":"resolved","resolved_ts":"<iso>","resolution":"<one-sentence>"}

Cron

0,30 * * * * state/bin/autopilot/break.sh  >> logs/snappy-os-breaker.log 2>&1
15,45 * * * * state/bin/autopilot/fix.sh   >> logs/snappy-os-fixer.log 2>&1

30-min cadence for each = 48 ticks/day × 2 = 96 cron fires. Each bounded to $1. Worst-case daily spend: $96; realistic spend with engagement gate + clean-row idle path: $10-25/day.

Engagement gate

Both scripts check state/engaged.json for the autopilot recipe. If disengaged, exit 0 silently. Disengage by removing "autopilot" from the recipes array in state/engaged.json (no CLI - the file is the source of truth).

Why two crons, not one

Prior design was a single autopilot --tick that tried to do everything. Two problems:

  1. Conflated detection and remediation → fixer had to redo detection work

and the subagent lost focus trying to hold both roles.

  1. No clean signal for "did we actually close a gap" - because the same

agent that found the problem was grading its own fix.

Splitting gives:

  • Adversarial pressure on the system from the breaker (it's incentivized to

find cracks, not forgive them).

  • Focused fixes from the fixer (one row = one PR-sized change).
  • PID signal from the delta between open-vs-resolved row ratios over time.

Eval

Shape-gate:

  • break.sh must append at least one row per successful tick (either a

finding or a clean row - never silent).

  • fix.sh must never append new p0/p1 rows; lint enforces breaker-only

writes for those severities.

  • Every tick logs an eval row to state/log/evals.ndjson with

skill: "snappy-autopilot-twocron", verb: "break"|"fix", score = 1 if exit 0 else 0.

Graduation

When break.sh consistently finds high-signal breakages (not noise) and fix.sh closes them with low revert rate (<10% regressions per week), the skill graduates to having its own PID-tuned prompts - the system rewrites its own breaker/fixer prompts based on the eval log.

AGENTS.md- what the AI loads when this skill comes up

snappy-autopilot-twocron - loader

Per-turn rules for the snappy-autopilot-twocron skill. Full reference: state/skills/snappy-autopilot-twocron.md. Do not skip these.

STATUS (2026-04-23): cron is OFF

The breaker + fixer cron lines were removed from crontab -l on 2026-04-19 (REMOVED 2026-04-19: PID loop carries load via evals; /snappy-fix is on-demand path). "autopilot" was also dropped from state/engaged.json on 2026-04-23 during a cron-drift audit. The skill is not running autonomously. Both scripts below still work when invoked by hand; they are the on-demand path, not a continuous loop.

Critical Rules

  • Breaker NEVER fixes; fixer NEVER adds new p0/p1 rows - actor ≠ auditor at the autopilot loop level (lint enforces breaker-only writes for p0/p1)
  • The shared report (state/log/breakage-report.ndjson) IS the queue - append-only from breaker; in-place row rewrite (status open → resolved) from fixer
  • Breaker MUST append at least one row per invocation - either a finding or a clean row ({sev:"clean", status:"none-found"}); silent runs score 0
  • Both scripts still check state/engaged.json for the autopilot recipe; if absent (current default), they exit 0 silently - to re-engage for a manual session, add "autopilot" back to the recipes array
  • Per-invocation budget cap $1 / timeout 5min still applies for safety when run by hand
  • Fixer rewrites resolved row in place: {...original fields..., status:"resolved", resolved_ts, resolution:"<one-sentence>"}

Commands

|breaker: state/bin/autopilot/break.sh - on-demand only (cron REMOVED 2026-04-19) → logs/snappy-os-breaker.log |fixer: state/bin/autopilot/fix.sh - on-demand only (cron REMOVED 2026-04-19) → logs/snappy-os-fixer.log |queue: state/log/breakage-report.ndjson |engagement gate: state/engaged.json - autopilot NOT in recipes as of 2026-04-23; add it back before invoking if you want the engagement-gate path |eval log: state/log/evals.ndjson (skill: "snappy-autopilot-twocron", verb: "break"|"fix") |replacement: PID loop via evals + /snappy-fix for on-demand remediation

Known Pitfalls

  • A single autopilot --tick script that does both detection AND remediation conflates the roles; the fixer ends up redoing detection work and grading its own fix - kept the two scripts split for a reason, even now that they're on-demand
  • Silent runs (no row appended) make the PID delta unmeasurable - always emit either a finding or a clean row
  • Do NOT restore the cron lines without operator sign-off. The removal on 2026-04-19 was deliberate; PID-via-evals is the current load-bearing replacement.

Self-Test

An agent reading this should correctly:

  1. [ ] Refuse to let the fixer append new p0/p1 findings, even when it notices something while fixing?
  2. [ ] Always emit a row (finding OR clean) per breaker tick, never silent?
  3. [ ] Check state/engaged.json and exit 0 when disengaged, before spending any model budget?
  4. [ ] Rewrite the original row in place when resolving (not append a separate "resolution" row that loses the original context)?

Self-report

If this loader fell short, append a line:

echo "[$(date -u +%FT%TZ)] snappy-autopilot-twocron: <what was missing>" >> ~/.claude/logs/snappy-os-loader-feedback.log

api.ts- the code it can call

⚠ no api.ts - this skill has no typed action surface

scripts- helper scripts it can run

prose-only skill - 5 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 10 runs

rubric auto-shape no rubric declared
recent mean 1.00 · 10 runs actor/auditor: unverifiable
deps sweep commit-report
timestamp verb score primary_issue artifact
2026-04-25 04:11Z - 1.00 - -
2026-04-21 15:59Z - 1.00 - -
2026-04-21 15:57Z - 1.00 - -
2026-04-21 03:53Z - 1.00 - -
2026-04-25 04:11Z - 1.00 - -
2026-04-21 15:59Z - 1.00 - -
2026-04-21 15:57Z - 1.00 - -
2026-04-21 03:53Z - 1.00 - -
2026-04-25 04:11Z - 1.00 - -
2026-04-21 15:59Z - 1.00 - -