.md file to compare — side-by-side diff against snappy-hello
snappy-hello
at a glance— dimensions parsed from snappy-hello.md
on-disk anatomy — the four parts a skill can have on disk (stack.md §1e) 3/4 present
Every row below is a file path that either exists in the repo or doesn't. Only Skill is required. The other three are optional — add them as the work demands. This is the physical shape; the harness panel below is the operational shape.
state/skills/snappy-hello/SKILL.md
present
state/lib/snappy-hello.ts
present
state/bin/snappy-hello/
not present
state/skills/snappy-hello/AGENTS.md
present
rubric — the analytic "definition of good" (program.md §6.1) 5 criteria · 5 deterministic
Each row is a named criterion in the SKILL.md’s ## Rubric block. deterministic = a script or lint decides; judge = a model reads the run output and grades. An eval row may include criteria.<name> = { score: 0|0.5|1, rationale }; the scalar score is derived from the criterion mean. Single-scalar grading hides where failures are — this is what makes the regen loop’s signal actionable.
harness — the operating frame every snappy-os skill runs inside 3/5 present
Every skill ships with the same five-part frame. An actor produces, a different auditor grades (program.md §5), a loader footer heals loader gaps on every turn, a regen drain rewrites loaders asynchronously, and every run lands a row in the eval log. Cells marked not present carry a one-line teach — absence is load-bearing for snappy-hello.
No auditor surfaced. Without a separate grader, the producer can rubber-stamp itself — flag in review (program.md §5).
state/log/evals.ndjson ## Critical Rules section in AGENTS.md. Hard-won invariants, if any, live inline in the prose below.
PID feedback — how this loader taught itself mock — real endpoint pending
Rows from state/log/agents-md-feedback.log filtered by snappy-hello. Every agent run that hits an unhandled case appends one line — FIXED when the loader was edited inline (P-fix), LOGGED when the gap is too big for an inline edit and the state/regen/drain.sh queue will rewrite it asynchronously.
- Loading feedback rows…
pipeline— parsed stages
snappy-os run snappy-hello # real run; writes one row to evals.ndjson
SKILL.md— the prose file
snappy-hello
The hello-world skill. Read this to see the minimum shape of a snappy-os skill before you write your own. Every section below is the minimum; yours can grow.
When to use
- First run after
npx snappy-os init. Confirms the loop works end-to-end. - Smoke test for new installs — one round-trip through dispatch → eval → log.
Commands
snappy-os run snappy-hello # real run; writes one row to evals.ndjson
snappy-os run snappy-hello --dry-run # scope-only; greeting → stderr, apply:false
Self-test
snappy-os run snappy-hello
tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id'
# → true
Eval
Actor prints the greeting to stdout and writes a pending row with actor_session_id set. An auditor child-process re-reads stdout, grades the greeting shape, writes auditor_session_id back onto the same row. Actor ≠ auditor holds because the auditor runs in a separate subprocess.
Rubric
criteria:
- name: greeting_sent_to_stdout
kind: deterministic
check: "The command 'snappy-os run snappy-hello' prints a greeting to stdout."
- name: eval_row_written
kind: deterministic
check: "A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes."
- name: actor_id_in_eval_row
kind: deterministic
check: "The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field."
- name: auditor_updates_eval_row
kind: deterministic
check: "The same eval row is subsequently updated to include an 'auditor_session_id' field."
- name: actor_auditor_ids_differ
kind: deterministic
check: "The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''."AGENTS.md— the loader file
snappy-hello — per-turn loader
UI Resources
state/skills/snappy-hello/resources/ui.openui
The hello-world skill. Proves the loop works end-to-end on a fresh install.
Command index
| Action | Command |
|---|---|
| greet (real) | npx tsx state/lib/snappy-hello.ts |
| greet (scope-only) | npx tsx state/lib/snappy-hello.ts --dry-run |
| via CLI | snappy-os run snappy-hello [--dry-run] |
Invariants
- Exit 0 on success. Non-zero exit is a bug; fix, don't retry.
- Real run: greeting to stdout, eval row
apply: true. - Dry-run: greeting to stderr, eval row
apply: false. No side effects. actor_session_id ≠ auditor_session_idon every row (actor is this process;
auditor is a child process the actor spawns after greeting to grade stdout).
Gotchas
.env.cacheis optional for this skill —env("USER", false)falls back to
$USER from the process environment. Skills that need real credentials document that on their own .md page.
- The greeting format is load-bearing: the auditor subprocess checks it starts
with hello, and contains the snappy-os version. Don't "improve" the phrasing without updating the auditor.
<!-- kernel-ok: historical attribution for the PID-loader pattern. -->
Self-correcting loader (PID feedback)
This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.
Before you finish, do two things:
1. Fix gaps (P — proportional). MANDATORY. If this loader didn't cover your case — if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk — you MUST attempt an Edit to this AGENTS.md before you log.
- Default action: edit this file. Surgical. Add the missing line, table
row, or code example where it belongs. No restructuring.
- Only
LOGGEDis allowed when: the fix needs >10 lines, spans multiple
files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.
- "I didn't have time" / "it's minor" / "the next agent will figure it out"
are NOT valid reasons. Minor gaps compound — that's the whole point of this loop.
- Concrete example of a valid P-fix: discover
agent-browser screenshot
takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.
- The goal: the next agent never has to leave the loader.
2. Log the result.
echo "[$(date -u +%FT%TZ)] snappy-hello: <what was missing or fixed> [FIXED|LOGGED]" >> state/log/loader-feedback.log
FIXED= you patched this loader inline (P-fix).LOGGED= too large for inline; the PostToolUse enqueue + Stop-hook drain
will rewrite the loader from scratch on next session-end.
Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.
OpenUI Resource
- Skill-owned OpenUI Lang resource:
state/skills/snappy-hello/resources/ui.openui. Read it before rendering or editing this skill's generated component surface. - Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
- System resources compose OpenUI primitives and inherit SnappyChat tokens. Use
ui_contract: brandedin SKILL.md only for deliberate platform or client visuals.
api.ts— typed surface
#!/usr/bin/env npx tsx
/**
* state/lib/snappy-hello.ts — the hello-world skill.
*
* G1 of the 1.0.7 release gate: on a fresh install, this is the first skill
* a new user pulls + runs. It must:
* 1. Print a greeting with the user's name and the snappy-os version.
* 2. Write one row to state/log/evals.ndjson with actor_session_id and
* auditor_session_id populated and non-identical (actor ≠ auditor).
* 3. Exit 0.
*
* Actor/auditor split: the actor runs in this process and writes the greeting
* + a pending eval fragment to a temp file. Before returning, the actor spawns
* a child process via `tsx` that re-reads the greeting from its own captured
* stdout and grades it. The child's session id becomes auditor_session_id.
* The child writes the final eval row. Neither process writes two rows.
*
* Invoked:
* - directly: `npx tsx state/lib/snappy-hello.ts [--dry-run]`
* - via CLI: `snappy-os run snappy-hello [--dry-run]`
*/
import { spawnSync } from "child_process";
import { readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
import { env } from "./env.ts";
import { score, sessionId, type EvalRow } from "./eval.ts";
const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");
function readVersion(): string {
try {
const pkg = JSON.parse(readFileSync(join(ROOT, "package.json"), "utf8"));
return pkg.version ?? "?";
} catch {
return "?";
}
}
function userName(): string {
return env("USER", false) || process.env.USER || process.env.LOGNAME || "friend";
}
function greeting(): string {
return `hello, ${userName()} — snappy-os ${readVersion()} reporting in.`;
}
export async function hello(opts: { dryRun: boolean } = { dryRun: false }): Promise<void> {
const msg = greeting();
const stream = opts.dryRun ? process.stderr : process.stdout;
stream.write(msg + "\n");
// Actor is *this* process. The auditor is a child process that grades the
// greeting we just emitted. The only contract between them: the greeting
// string is passed explicitly as an arg (not re-read from stdout) so the
// auditor can verify shape without relying on subprocess stdout plumbing.
const actorSid = sessionId();
const auditorResult = spawnSync(
"npx",
["--yes", "tsx", join(HERE, "snappy-hello.ts"), "--audit", msg, actorSid, opts.dryRun ? "true" : "false"],
{ stdio: ["ignore", "inherit", "inherit"], encoding: "utf8" },
);
if (auditorResult.status !== 0) {
process.stderr.write("[snappy-hello] auditor failed; eval row not written\n");
process.exit(auditorResult.status ?? 1);
}
}
/**
* Audit mode. Called by the actor as a child process. Re-derives what the
* greeting *should* look like, compares, writes the eval row with the
* child's own sessionId() as auditor_session_id. Exits 0 on pass, 1 on fail.
*/
function audit(actorGreeting: string, actorSid: string, dryRunFlag: string): never {
const expected = greeting();
const pass = actorGreeting.startsWith("hello, ") && actorGreeting.includes("snappy-os ");
const scoreVal = pass ? 1.0 : 0.0;
const primary_issue = pass ? null : "greeting-shape-mismatch";
const row: Omit<EvalRow, "skill" | "run_id" | "mode"> & { mode?: "auto" | "manual" } = {
score: scoreVal,
actor_session_id: actorSid,
auditor_session_id: sessionId(),
apply: dryRunFlag !== "true",
primary_issue,
verb: "greet",
notes: pass
? undefined
: `actor: ${actorGreeting}; expected prefix "hello, " and contains "snappy-os "; got: ${actorGreeting}; expected sample: ${expected}`,
};
score("snappy-hello", actorSid, row);
process.exit(pass ? 0 : 1);
}
// CLI. Two modes: normal (greet + spawn auditor) and --audit (child process).
if (import.meta.url.startsWith("file:") && process.argv[1] && import.meta.url.endsWith(process.argv[1].split("/").pop()!)) {
const args = process.argv.slice(2);
if (args[0] === "--audit") {
// --audit <greeting> <actor_sid> <dry_run_flag>
audit(args[1] ?? "", args[2] ?? "", args[3] ?? "false");
}
const dryRun = args.includes("--dry-run");
await hello({ dryRun });
}
scripts— sidecar under state/bin/snappy-hello/
prose-only skill — 3 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).
last run— most recent artifact at state/log/artifacts/snappy-hello/
loading…
eval contract— rubric + last 10 runs + deps
| timestamp | verb | score | primary_issue | artifact |
|---|---|---|---|---|
| 2026-05-02 16:51Z | — | 1.00 | — | — |
| 2026-04-25 04:11Z | — | 1.00 | — | — |
| 2026-04-25 02:19Z | — | 1.00 | — | — |
| 2026-04-21 15:58Z | — | 1.00 | — | — |
| 2026-04-21 15:57Z | — | 1.00 | — | — |
| 2026-04-21 03:53Z | — | 1.00 | — | — |
| 2026-04-21 03:49Z | — | 1.00 | — | — |
| 2026-04-21 03:42Z | — | 1.00 | — | — |
| 2026-04-21 03:05Z | — | 1.00 | — | — |
| 2026-04-21 03:03Z | — | 1.00 | — | — |