OR Key
drop another .md file to compare — side-by-side diff against snappy-hello

snappy-hello

reading primer for what a skill looks like; prints a greeting and writes an eval row
public 2 files 10 recent evals

at a glance— dimensions parsed from snappy-hello.md

eval modeauto
categorySystem
stages2
dependssettings

on-disk anatomy — the four parts a skill can have on disk (stack.md §1e) 3/4 present

Every row below is a file path that either exists in the repo or doesn't. Only Skill is required. The other three are optional — add them as the work demands. This is the physical shape; the harness panel below is the operational shape.

Skill
state/skills/snappy-hello/SKILL.md present
Anthropic-spec markdown + YAML frontmatter
the atom — frontmatter declares name/description/eval; body is the prose steps
api.ts
state/lib/snappy-hello.ts present
TypeScript library module, shebanged, dual-purpose import/CLI
exports typed functions; also runnable via npx tsx for scripting
scripts
state/bin/snappy-hello/ not present
shell or standalone files for ops
optional — extract executable blocks here when SKILL.md grows past ≥3 command lines
Loader
state/skills/snappy-hello/AGENTS.md present
short keyword-gated per-turn context
injected by the hook when triggers match; kept short on purpose

rubric — the analytic "definition of good" (program.md §6.1) 5 criteria · 5 deterministic

Each row is a named criterion in the SKILL.md’s ## Rubric block. deterministic = a script or lint decides; judge = a model reads the run output and grades. An eval row may include criteria.<name> = { score: 0|0.5|1, rationale }; the scalar score is derived from the criterion mean. Single-scalar grading hides where failures are — this is what makes the regen loop’s signal actionable.

name
kind
check
greeting_sent_to_stdout
deterministic
The command 'snappy-os run snappy-hello' prints a greeting to stdout.
eval_row_written
deterministic
A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes.
actor_id_in_eval_row
deterministic
The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field.
auditor_updates_eval_row
deterministic
The same eval row is subsequently updated to include an 'auditor_session_id' field.
actor_auditor_ids_differ
deterministic
The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''.

harness — the operating frame every snappy-os skill runs inside 3/5 present

Every skill ships with the same five-part frame. An actor produces, a different auditor grades (program.md §5), a loader footer heals loader gaps on every turn, a regen drain rewrites loaders asynchronously, and every run lands a row in the eval log. Cells marked not present carry a one-line teach — absence is load-bearing for snappy-hello.

produces Actor
inferred
snappy-os run snappy-hello from code block
No actor model named and no AGENTS.md invoke command. The first shell block in SKILL.md is the closest thing to a producer.
grades Auditor
not present

No auditor surfaced. Without a separate grader, the producer can rubber-stamp itself — flag in review (program.md §5).

frame
learns Loader footer
present
PID self-correction P-fix footer
Loader = setpoint, agent = sensor, gap = error signal. Every agent run must either Edit this loader inline [FIXED] or log [LOGGED] for async regen.
rewrites Regen drain
present
regen-pending.txt async rewriter
Edits to this AGENTS.md get enqueued for regeneration by the PostToolUse hook; the Stop hook drains (`state/regen/drain.sh`).
records Eval log
present
state/log/evals.ndjson auto eval
Every run appends one row here — schema in `state/lib/eval.ts`. No scheduled reader; next-agent turn reads via the inject hook.
Critical rules pulled from AGENTS.md · hard-won invariants, not footnotes
No ## Critical Rules section in AGENTS.md. Hard-won invariants, if any, live inline in the prose below.

PID feedback — how this loader taught itself mock — real endpoint pending

Rows from state/log/agents-md-feedback.log filtered by snappy-hello. Every agent run that hits an unhandled case appends one line — FIXED when the loader was edited inline (P-fix), LOGGED when the gap is too big for an inline edit and the state/regen/drain.sh queue will rewrite it asynchronously.

  1. Loading feedback rows…

pipeline— parsed stages

inputs settings
1 generator
snappy-os
snappy-os run snappy-hello            # real run; writes one row to evals.ndjson
2 auditor
snappy-os
snappy-os run snappy-hello

SKILL.md— the prose file

snappy-hello

The hello-world skill. Read this to see the minimum shape of a snappy-os skill before you write your own. Every section below is the minimum; yours can grow.

When to use

  • First run after npx snappy-os init. Confirms the loop works end-to-end.
  • Smoke test for new installs — one round-trip through dispatch → eval → log.

Commands

snappy-os run snappy-hello            # real run; writes one row to evals.ndjson
snappy-os run snappy-hello --dry-run  # scope-only; greeting → stderr, apply:false

Self-test

snappy-os run snappy-hello
tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id'
# → true

Eval

Actor prints the greeting to stdout and writes a pending row with actor_session_id set. An auditor child-process re-reads stdout, grades the greeting shape, writes auditor_session_id back onto the same row. Actor ≠ auditor holds because the auditor runs in a separate subprocess.

Rubric

criteria:
  - name: greeting_sent_to_stdout
    kind: deterministic
    check: "The command 'snappy-os run snappy-hello' prints a greeting to stdout."
  - name: eval_row_written
    kind: deterministic
    check: "A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes."
  - name: actor_id_in_eval_row
    kind: deterministic
    check: "The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field."
  - name: auditor_updates_eval_row
    kind: deterministic
    check: "The same eval row is subsequently updated to include an 'auditor_session_id' field."
  - name: actor_auditor_ids_differ
    kind: deterministic
    check: "The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''."

AGENTS.md— the loader file

snappy-hello — per-turn loader

UI Resources

  • state/skills/snappy-hello/resources/ui.openui

The hello-world skill. Proves the loop works end-to-end on a fresh install.

Command index

ActionCommand
greet (real)npx tsx state/lib/snappy-hello.ts
greet (scope-only)npx tsx state/lib/snappy-hello.ts --dry-run
via CLIsnappy-os run snappy-hello [--dry-run]

Invariants

  • Exit 0 on success. Non-zero exit is a bug; fix, don't retry.
  • Real run: greeting to stdout, eval row apply: true.
  • Dry-run: greeting to stderr, eval row apply: false. No side effects.
  • actor_session_id ≠ auditor_session_id on every row (actor is this process;

auditor is a child process the actor spawns after greeting to grade stdout).

Gotchas

  • .env.cache is optional for this skill — env("USER", false) falls back to

$USER from the process environment. Skills that need real credentials document that on their own .md page.

  • The greeting format is load-bearing: the auditor subprocess checks it starts

with hello, and contains the snappy-os version. Don't "improve" the phrasing without updating the auditor.


<!-- kernel-ok: historical attribution for the PID-loader pattern. -->

Self-correcting loader (PID feedback)

This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.

Before you finish, do two things:

1. Fix gaps (P — proportional). MANDATORY. If this loader didn't cover your case — if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk — you MUST attempt an Edit to this AGENTS.md before you log.

  • Default action: edit this file. Surgical. Add the missing line, table

row, or code example where it belongs. No restructuring.

  • Only LOGGED is allowed when: the fix needs >10 lines, spans multiple

files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.

  • "I didn't have time" / "it's minor" / "the next agent will figure it out"

are NOT valid reasons. Minor gaps compound — that's the whole point of this loop.

  • Concrete example of a valid P-fix: discover agent-browser screenshot

takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.

  • The goal: the next agent never has to leave the loader.

2. Log the result.

echo "[$(date -u +%FT%TZ)] snappy-hello: <what was missing or fixed> [FIXED|LOGGED]" >> state/log/loader-feedback.log
  • FIXED = you patched this loader inline (P-fix).
  • LOGGED = too large for inline; the PostToolUse enqueue + Stop-hook drain

will rewrite the loader from scratch on next session-end.

Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.

OpenUI Resource

  • Skill-owned OpenUI Lang resource: state/skills/snappy-hello/resources/ui.openui. Read it before rendering or editing this skill's generated component surface.
  • Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
  • System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

api.ts— typed surface

#!/usr/bin/env npx tsx
/**
 * state/lib/snappy-hello.ts — the hello-world skill.
 *
 * G1 of the 1.0.7 release gate: on a fresh install, this is the first skill
 * a new user pulls + runs. It must:
 *   1. Print a greeting with the user's name and the snappy-os version.
 *   2. Write one row to state/log/evals.ndjson with actor_session_id and
 *      auditor_session_id populated and non-identical (actor ≠ auditor).
 *   3. Exit 0.
 *
 * Actor/auditor split: the actor runs in this process and writes the greeting
 * + a pending eval fragment to a temp file. Before returning, the actor spawns
 * a child process via `tsx` that re-reads the greeting from its own captured
 * stdout and grades it. The child's session id becomes auditor_session_id.
 * The child writes the final eval row. Neither process writes two rows.
 *
 * Invoked:
 *   - directly: `npx tsx state/lib/snappy-hello.ts [--dry-run]`
 *   - via CLI:  `snappy-os run snappy-hello [--dry-run]`
 */

import { spawnSync } from "child_process";
import { readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
import { env } from "./env.ts";
import { score, sessionId, type EvalRow } from "./eval.ts";

const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");

function readVersion(): string {
  try {
    const pkg = JSON.parse(readFileSync(join(ROOT, "package.json"), "utf8"));
    return pkg.version ?? "?";
  } catch {
    return "?";
  }
}

function userName(): string {
  return env("USER", false) || process.env.USER || process.env.LOGNAME || "friend";
}

function greeting(): string {
  return `hello, ${userName()} — snappy-os ${readVersion()} reporting in.`;
}

export async function hello(opts: { dryRun: boolean } = { dryRun: false }): Promise<void> {
  const msg = greeting();
  const stream = opts.dryRun ? process.stderr : process.stdout;
  stream.write(msg + "\n");

  // Actor is *this* process. The auditor is a child process that grades the
  // greeting we just emitted. The only contract between them: the greeting
  // string is passed explicitly as an arg (not re-read from stdout) so the
  // auditor can verify shape without relying on subprocess stdout plumbing.
  const actorSid = sessionId();
  const auditorResult = spawnSync(
    "npx",
    ["--yes", "tsx", join(HERE, "snappy-hello.ts"), "--audit", msg, actorSid, opts.dryRun ? "true" : "false"],
    { stdio: ["ignore", "inherit", "inherit"], encoding: "utf8" },
  );
  if (auditorResult.status !== 0) {
    process.stderr.write("[snappy-hello] auditor failed; eval row not written\n");
    process.exit(auditorResult.status ?? 1);
  }
}

/**
 * Audit mode. Called by the actor as a child process. Re-derives what the
 * greeting *should* look like, compares, writes the eval row with the
 * child's own sessionId() as auditor_session_id. Exits 0 on pass, 1 on fail.
 */
function audit(actorGreeting: string, actorSid: string, dryRunFlag: string): never {
  const expected = greeting();
  const pass = actorGreeting.startsWith("hello, ") && actorGreeting.includes("snappy-os ");
  const scoreVal = pass ? 1.0 : 0.0;
  const primary_issue = pass ? null : "greeting-shape-mismatch";

  const row: Omit<EvalRow, "skill" | "run_id" | "mode"> & { mode?: "auto" | "manual" } = {
    score: scoreVal,
    actor_session_id: actorSid,
    auditor_session_id: sessionId(),
    apply: dryRunFlag !== "true",
    primary_issue,
    verb: "greet",
    notes: pass
      ? undefined
      : `actor: ${actorGreeting}; expected prefix "hello, " and contains "snappy-os "; got: ${actorGreeting}; expected sample: ${expected}`,
  };

  score("snappy-hello", actorSid, row);
  process.exit(pass ? 0 : 1);
}

// CLI. Two modes: normal (greet + spawn auditor) and --audit (child process).
if (import.meta.url.startsWith("file:") && process.argv[1] && import.meta.url.endsWith(process.argv[1].split("/").pop()!)) {
  const args = process.argv.slice(2);
  if (args[0] === "--audit") {
    // --audit <greeting> <actor_sid> <dry_run_flag>
    audit(args[1] ?? "", args[2] ?? "", args[3] ?? "false");
  }
  const dryRun = args.includes("--dry-run");
  await hello({ dryRun });
}

scripts— sidecar under state/bin/snappy-hello/

prose-only skill — 3 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

last run— most recent artifact at state/log/artifacts/snappy-hello/

loading…

eval contract— rubric + last 10 runs + deps

rubric auto no rubric declared
recent mean 1.00 · 10 runs actor/auditor: unverifiable
deps settings
timestamp verb score primary_issue artifact
2026-05-02 16:51Z 1.00
2026-04-25 04:11Z 1.00
2026-04-25 02:19Z 1.00
2026-04-21 15:58Z 1.00
2026-04-21 15:57Z 1.00
2026-04-21 03:53Z 1.00
2026-04-21 03:49Z 1.00
2026-04-21 03:42Z 1.00
2026-04-21 03:05Z 1.00
2026-04-21 03:03Z 1.00