drop another .md file to compare - side-by-side diff against snappy-fix

snappy-fix

Triage and fix the top open p0/p1 issue in snappy-os. Triggered when the user says or asks "fix the top issue". Reads +, picks the highest-severity oldest open row, shows its JSON (especially repro/expected/actual/fix), and either applies the fix or routes the operator to the file that owns it.

description: "Triggers on prompt mention of 'snappy-fix' or '/snappy-fix'."

personal 2 files 10 recent evals

Export

What it does for you

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me

For developers how this skill is built, graded, and how it runs

at a glance- the short version

eval modeauto-shape

categoryOps

stages2

dependslog

what's inside - the parts that make up a skill 2/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill

state/skills/snappy-fix.md present

the skill itself, in plain text

The main file. It says what the skill is and lays out the steps in plain English.

Code

state/lib/snappy-fix.ts not present

code the skill can run

Optional. Many skills are just words and need no code at all.

Scripts

state/bin/snappy-fix/ not present

helper scripts

Optional. Added when a skill has a few commands to run.

Loader

state/skills/snappy-fix.agents.md present

what the AI loads on the fly

Loaded automatically the moment this skill is needed. Kept short on purpose.

how it runs - the shared frame every skill uses 3/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker

inferred

prose skill — from the run command

No worker is named directly, so the command this skill runs is treated as the worker.

checks the work The reviewer

inferred

shape gate an automatic check

The check is an automatic pass or fail on the shape of the result, run separately from the work itself.

frame

learns Self-correction

present

fixes itself learns from gaps

When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.

tidies up Background fixes

present

queued for rewrite runs in the background

Bigger fixes that can't be made on the spot get queued and rewritten in the background later.

remembers Run history

present

state/log/evals.ndjson unknown runs

Every run is written down here, so the next time this skill is used it already knows how the last runs went.

Critical rules the things this skill must not get wrong

RECURRING AREAS DO NOT GET PATCHED — if pid-trends.ts shows the row's area was resolved >=3 times in the last 7d, the underlying skill needs a regen, not another patch. The fixer's G4c gate auto-escalates by writing a brief to state/log/regen-queue/<ts>-<area>.md, appending a resolved_by:"fixer-escalate" row, and exiting WITHOUT invoking Claude. If you find yourself patching the same area for the third time, STOP and write the brief by hand instead of fixing again
ALWAYS dedupe by area BEFORE filtering by status — both logs are append-only; an "open" row may be followed by a newer "resolved" row two lines below
ALWAYS take the LATEST row per area first, then ask "is it open?" — grep '"status":"open"' will surface stale rows whose closer already exists
A row is closed if ANY of these fire: (1) latest status resolved, (2) frictions row has verified field, (3) later eval row for same skill scored 1 with newer ts, (4) later same-area row with msg starting SOLVED|PASS|RESOLVED|✅, (5) cross-file area-match, (6) resolved_by:"fixer-escalate" row with a brief_path (escalation = closure)
When closure marker #3 or #4 fires, ALSO append a closing friction row with verified: "self-healed: <evidence>" so source of truth catches up
NEVER mark a row resolved unless the repro confirms it — breakage-report is ground truth, lying makes the byline lie
+2 more in AGENTS.md →

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

Loading feedback rows…

how the work flows- step by step

inputs log

1 generator

invoke

prose skill — follow steps in `state/skills/snappy-fix.md`

+ eval for this step

→

2 data

eval log

`state/log/evals.ndjson` (skill: "snappy-fix")

+ eval for this step

SKILL.md- the skill, written out in plain English

snappy-fix

One job: turn the byline's 🚨 N open issues into a concrete next move. The byline is allowed to be terse; this skill is where the detail lives.

Usage

/snappy-fix              # auto-pick top open p0/p1
/snappy-fix <area>       # pick the top open row matching <area>

Steps

Fast path: bash state/bin/snappy-fix/pick.sh [area] dedupes both logs by area (latest row wins), filters to open p0/p1, and prints the top row's key fields (area, sev, surface, expected, actual, repro, fix) as JSON. The script handles the bookkeeping deterministically - stale-row false positives documented in the gotchas below are impossible. After pick.sh prints a row, jump to step 5 below to review + apply the fix.

Manual path (below) - use only if the script is unavailable or you need to audit its logic.

Read ~/projects/snappy-os/state/log/breakage-report.ndjson and

~/projects/snappy-os/state/log/frictions.ndjson. Dedupe by area first, then filter - both files are append-only, so a single area can have an "open" row followed by a newer "resolved" row. Per-row filters like grep '"status":"open"' will surface stale rows whose closer is sitting two lines below; you must take the LATEST row per area before asking "is it open?". Paste-ready awk:

   awk '/"area":"/ { match($0,/"area":"[^"]+"/); a=substr($0,RSTART+8,RLENGTH-9); last[a]=$0 }
        END { for (a in last) print last[a] }' state/log/breakage-report.ndjson

After dedupe, keep rows where sev/severity ∈ {p0, p1} AND none of the resolution markers below fire:

breakage-report: latest row's status == "resolved".
frictions (append-only): latest row has a verified field.
frictions (implicit, recurring agents): a later eval row in

~/projects/snappy-os/state/log/evals.ndjson for the same skill has score: 1 AND a ts newer than the friction's ts. The next cycle succeeded, so the wall is gone whether or not anyone closed the row by hand.

frictions (implicit, one-shots / smoke tests): a later same-area

row exists whose msg starts with SOLVED, PASS, RESOLVED, or ✅. This is the convention the breaker uses for areas that have no eval row (one-off smoke runs, manual investigations). Don't count those resolution-rows themselves as new walls either.

When marker #3 or #4 fires, also append a closing friction row carrying verified: "self-healed: <evidence>" so the source of truth catches up to reality and future filter runs are cheaper.

Sort: p0 before p1, then oldest ts first (most-overdue first).
If <area> argument given, narrow to rows whose area matches

case-insensitively.

Pick row 0. If none → reply "no open p0/p1; byline lied" and exit.
Show the operator the row's key fields, NOT the full JSON dump:

area + sev as the headline
surface (where the break is observable)
expected (what should happen)
actual (what's happening - this is the explanation)
repro (the exact command to reproduce - paste-ready)
fix (if present - the proposed remedy)

Run the repro to confirm the issue is still real (some auto-resolve

between when the breaker logged them and now).

If still broken AND fix is unambiguous (e.g. "rename X to Y", "add line N

to file Z"), apply it. Otherwise present the operator with two-three surgical options and let them pick.

After applying a fix:

Re-run the repro to confirm closure.
Append to state/log/breakage-report.ndjson:

{ts, area, status:"resolved", resolved_by:"snappy-fix", note:"<one-liner>"}

Log eval row to state/log/evals.ndjson (use the append() helper in

state/lib/log.ts if invoking from TS, or shell printf … | tee -a): {ts, skill:"snappy-fix", verb:"triage" or "fix", score, area}.

Post-fix typecheck gate

When the row just resolved is area: "typecheck-broken" (or the fix touched state/lib/**/*.ts), run the structural gate before declaring closure:

bash state/bin/snappy-fix/typecheck-gate.sh

The gate does two things: runs npm run typecheck once (authoritative) and sweeps state/lib/ for the narrow patterns that keep re-opening this class of friction - specifically duplicate import ... from "fs" / "node:fs" pairs in one file. On a non-zero exit, do NOT close the row yet - surface the offenders to the operator and treat each as its own sub-fix.

If the same area == "typecheck-broken" has resolved ≥2 times in the last 7 days (grep state/log/breakage-report.ndjson), the recurrence is structural, not per-instance. Escalate by opening a p1 row with area: "typecheck-recurrence" whose fix field points the breaker at the pre-commit hook gap, not at the current TS errors. Symptom-level fixes have stopped sticking; the class of error needs a structural block upstream.

Why

The byline is two lines of pixels. Even named (🚨 9 open issues p0 claude-cron · p0 meeting-followup · +7), the operator still doesn't know which is loudest, what the symptom looks like, or what the breaker thinks the fix is. This skill bridges that gap: pull the top JSON, show the four fields that matter, run the repro, route or fix.

Eval

Outcome-gate (replaces the prior shape-only gate that scored 1.0 on every run regardless of whether the picked row reflected reality - flagged by state/lint/evals-integrity.ts on 2026-04-18, see state/log/eval-cheaters.ndjson).

pick.sh runs the picked row's repro field and scores by whether the ground truth on this machine matches what the row claims:

Picked row state	Repro exit	Score	Note in eval row
open	non-zero	1	`open-verified-real` (byline honest)
open	zero	0	`open-but-repro-passes-stale` (false alarm - picker should have closed it)
resolved	zero	1	`resolved-verified` (genuinely closed)
resolved	non-zero	0	`resolved-but-repro-fails` (regression / lying-resolved)
missing `repro`	n/a	0	`missing-repro` (cannot ground-truth)
repro hits deny-list (`rm -rf`, `sudo`, piped curl-to-shell, `git push`, `git reset --hard`)	n/a	0	`repro-unsafe-skipped`
no rows in either log	n/a	1	`no-rows` / `no-area-rows` (legit empty)
no open p0/p1 after dedupe	n/a	1	`no open p0/p1; byline lied`

Repros run with bash -c from the repo root, 30s wall-clock timeout, output discarded (only the exit code is ground truth). The eval row carries repro_exit, area, and row_status so the PID loop can audit which classes of failure are recurring.

Override (testing only): SNAPPY_FIX_SKIP_REPRO=1 reverts to shape-only scoring. Use this sparingly - every shape-only run will read in evals-integrity.ts as cheater-style data on the next sweep.

Actor ≠ auditor: the actor is the picker logic that selects the top open row; the auditor is the repro shell process running with its own exit code. They cannot collude - the repro string is operator-authored, not generated alongside the picker.

Gotchas

Field-shape drift between the two logs: breakage-report uses sev,

frictions sometimes uses severity (uppercase P0). Normalize.

A row's repro may itself be broken (depends on a path that no longer

exists). Treat that as a 2nd-order issue and report - do not silently swallow.

fix is sometimes a list of options ("either backfill X or loosen Y or

normalize Z"). Show all three; do not pick for the operator unless the choice is obvious.

Do NOT mark a row resolved unless the repro confirms it. The breakage-report

is the ground truth; lying to it makes the byline lie.

new-skills-unregistered fix: DO NOT set eval: shape on a prose-only

skill - state/lint/library-shape.ts will fail because shape requires a backing state/lib/<name>.ts. The 2026-04-18 fixer tripped on this (see shape-contract-mismatch re-open). Always run bash state/bin/skill/validate.sh <name> to pick the correct value from the table in state/skills/skill.md#required-frontmatter, and bump the Skills (N total · ...) header in state/index.md in the same commit so check.ts doesn't reopen index-skill-count-drift on the next tick.

AGENTS.md- what the AI loads when this skill comes up

snappy-fix - loader

Per-turn rules for the snappy-fix skill. Full reference: state/skills/snappy-fix.md. Do not skip these.

Critical Rules

RECURRING AREAS DO NOT GET PATCHED - if pid-trends.ts shows the row's area was resolved >=3 times in the last 7d, the underlying skill needs a regen, not another patch. The fixer's G4c gate auto-escalates by writing a brief to state/log/regen-queue/<ts>-<area>.md, appending a resolved_by:"fixer-escalate" row, and exiting WITHOUT invoking Claude. If you find yourself patching the same area for the third time, STOP and write the brief by hand instead of fixing again
ALWAYS dedupe by area BEFORE filtering by status - both logs are append-only; an "open" row may be followed by a newer "resolved" row two lines below
ALWAYS take the LATEST row per area first, then ask "is it open?" - grep '"status":"open"' will surface stale rows whose closer already exists
A row is closed if ANY of these fire: (1) latest status resolved, (2) frictions row has verified field, (3) later eval row for same skill scored 1 with newer ts, (4) later same-area row with msg starting SOLVED|PASS|RESOLVED|✅, (5) cross-file area-match, (6) resolved_by:"fixer-escalate" row with a brief_path (escalation = closure)
When closure marker #3 or #4 fires, ALSO append a closing friction row with verified: "self-healed: <evidence>" so source of truth catches up
NEVER mark a row resolved unless the repro confirms it - breakage-report is ground truth, lying makes the byline lie
Field-shape drifts: breakage-report uses lowercase sev (p0/p1/p2); frictions sometimes uses uppercase severity (P0/P1/P2). Normalize before comparing
breakage-report rows: {ts, sev, area, surface, expected, actual, repro, fix?, status?}; frictions rows: {ts, skill, friction, fix, severity, impact, verified?} - same concept, two shapes, merge in your head

Commands

|invoke: prose skill - follow steps in state/skills/snappy-fix.md |dedupe awk: awk '/"area":"/ { match($0,/"area":"[^"]+"/); a=substr($0,RSTART+8,RLENGTH-9); last[a]=$0 } END { for (a in last) print last[a] }' state/log/breakage-report.ndjson |sources: state/log/breakage-report.ndjson, state/log/frictions.ndjson, state/log/evals.ndjson |eval log: state/log/evals.ndjson (skill: "snappy-fix")

Known Pitfalls

Wall-of-record 2026-04-17: five "open" breakage rows ALL had newer "resolved" rows for same area; /snappy-fix kept showing already-closed work and the operator burned a session on it
A row's repro may itself be broken (depends on a path that no longer exists) - treat as 2nd-order issue and report; do not silently swallow
fix is sometimes a list of options ("either backfill X or loosen Y or normalize Z") - show all three, do not pick for the operator unless choice is obvious

Self-Test

An agent reading this should correctly:

[ ] Dedupe by area BEFORE filtering by status, never the reverse?
[ ] Treat a verified field on a frictions row as a closure marker even if status field is absent?
[ ] Refuse to mark a row resolved without re-running the repro to confirm?
[ ] Normalize sev vs severity when comparing across the two log files?

Self-report

If this loader fell short, append a line:

echo "[$(date -u +%FT%TZ)] snappy-fix: <what was missing>" >> ~/.claude/logs/snappy-os-loader-feedback.log

api.ts- the code it can call

⚠ no api.ts - this skill has no typed action surface

scripts- helper scripts it can run

prose-only skill - 2 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 10 runs

rubric auto-shape no rubric declared

recent mean 1.00 · 10 runs actor/auditor: unverifiable

deps log

timestamp	verb	score	primary_issue	artifact
2026-04-25 04:45Z	-	1.00	-	-
2026-04-25 04:31Z	-	1.00	-	-
2026-04-25 04:11Z	-	1.00	-	-
2026-04-25 03:16Z	-	1.00	-	-
2026-04-25 01:09Z	-	1.00	-	-
2026-04-24 22:43Z	-	1.00	-	-
2026-04-24 22:30Z	-	1.00	-	-
2026-04-24 16:42Z	-	1.00	-	-
2026-04-24 16:41Z	-	1.00	-	-
2026-04-24 16:30Z	-	1.00	-	-