drop another .md file to compare — side-by-side diff against snappy-hello

snappy-hello

reading primer for what a skill looks like; prints a greeting and writes an eval row

public 2 files 10 recent evals

Export

at a glance— dimensions parsed from snappy-hello.md

eval modeauto

categorySystem

stages2

dependssettings

on-disk anatomy — the four parts a skill can have on disk (stack.md §1e) 3/4 present

Every row below is a file path that either exists in the repo or doesn't. Only Skill is required. The other three are optional — add them as the work demands. This is the physical shape; the harness panel below is the operational shape.

Skill

state/skills/snappy-hello/SKILL.md present

Anthropic-spec markdown + YAML frontmatter

the atom — frontmatter declares name/description/eval; body is the prose steps

api.ts

state/lib/snappy-hello.ts present

TypeScript library module, shebanged, dual-purpose import/CLI

exports typed functions; also runnable via npx tsx for scripting

scripts

state/bin/snappy-hello/ not present

shell or standalone files for ops

optional — extract executable blocks here when SKILL.md grows past ≥3 command lines

Loader

state/skills/snappy-hello/AGENTS.md present

short keyword-gated per-turn context

injected by the hook when triggers match; kept short on purpose

rubric — the analytic "definition of good" (program.md §6.1) 5 criteria · 5 deterministic

Each row is a named criterion in the SKILL.md’s ## Rubric block. deterministic = a script or lint decides; judge = a model reads the run output and grades. An eval row may include criteria.<name> = { score: 0|0.5|1, rationale }; the scalar score is derived from the criterion mean. Single-scalar grading hides where failures are — this is what makes the regen loop’s signal actionable.

name

kind

check

greeting_sent_to_stdout

deterministic

The command 'snappy-os run snappy-hello' prints a greeting to stdout.

eval_row_written

deterministic

A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes.

actor_id_in_eval_row

deterministic

The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field.

auditor_updates_eval_row

deterministic

The same eval row is subsequently updated to include an 'auditor_session_id' field.

actor_auditor_ids_differ

deterministic

The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''.

harness — the operating frame every snappy-os skill runs inside 3/5 present

Every skill ships with the same five-part frame. An actor produces, a different auditor grades (program.md §5), a loader footer heals loader gaps on every turn, a regen drain rewrites loaders asynchronously, and every run lands a row in the eval log. Cells marked not present carry a one-line teach — absence is load-bearing for snappy-hello.

produces Actor

inferred

snappy-os run snappy-hello from code block

No actor model named and no AGENTS.md invoke command. The first shell block in SKILL.md is the closest thing to a producer.

grades Auditor

not present

No auditor surfaced. Without a separate grader, the producer can rubber-stamp itself — flag in review (program.md §5).

frame

learns Loader footer

present

PID self-correction P-fix footer

Loader = setpoint, agent = sensor, gap = error signal. Every agent run must either Edit this loader inline [FIXED] or log [LOGGED] for async regen.

rewrites Regen drain

present

regen-pending.txt async rewriter

Edits to this AGENTS.md get enqueued for regeneration by the PostToolUse hook; the Stop hook drains (`state/regen/drain.sh`).

records Eval log

present

state/log/evals.ndjson auto eval

Every run appends one row here — schema in `state/lib/eval.ts`. No scheduled reader; next-agent turn reads via the inject hook.

Critical rules pulled from AGENTS.md · hard-won invariants, not footnotes

No ## Critical Rules section in AGENTS.md. Hard-won invariants, if any, live inline in the prose below.

PID feedback — how this loader taught itself mock — real endpoint pending

Rows from state/log/agents-md-feedback.log filtered by snappy-hello. Every agent run that hits an unhandled case appends one line — FIXED when the loader was edited inline (P-fix), LOGGED when the gap is too big for an inline edit and the state/regen/drain.sh queue will rewrite it asynchronously.

Loading feedback rows…

pipeline— parsed stages

inputs settings

1 generator

snappy-os

snappy-os run snappy-hello            # real run; writes one row to evals.ndjson

+ eval for this step

→

2 auditor

snappy-os

snappy-os run snappy-hello

+ eval for this step

SKILL.md— the prose file

snappy-hello

The hello-world skill. Read this to see the minimum shape of a snappy-os skill before you write your own. Every section below is the minimum; yours can grow.

When to use

First run after npx snappy-os init. Confirms the loop works end-to-end.
Smoke test for new installs — one round-trip through dispatch → eval → log.

Commands

snappy-os run snappy-hello            # real run; writes one row to evals.ndjson
snappy-os run snappy-hello --dry-run  # scope-only; greeting → stderr, apply:false

Self-test

snappy-os run snappy-hello
tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id'
# → true

Eval

Actor prints the greeting to stdout and writes a pending row with actor_session_id set. An auditor child-process re-reads stdout, grades the greeting shape, writes auditor_session_id back onto the same row. Actor ≠ auditor holds because the auditor runs in a separate subprocess.

Rubric

criteria:
  - name: greeting_sent_to_stdout
    kind: deterministic
    check: "The command 'snappy-os run snappy-hello' prints a greeting to stdout."
  - name: eval_row_written
    kind: deterministic
    check: "A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes."
  - name: actor_id_in_eval_row
    kind: deterministic
    check: "The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field."
  - name: auditor_updates_eval_row
    kind: deterministic
    check: "The same eval row is subsequently updated to include an 'auditor_session_id' field."
  - name: actor_auditor_ids_differ
    kind: deterministic
    check: "The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''."

AGENTS.md— the loader file

snappy-hello — per-turn loader

UI Resources

state/skills/snappy-hello/resources/ui.openui

The hello-world skill. Proves the loop works end-to-end on a fresh install.

Command index

Action	Command
greet (real)	`npx tsx state/lib/snappy-hello.ts`
greet (scope-only)	`npx tsx state/lib/snappy-hello.ts --dry-run`
via CLI	`snappy-os run snappy-hello [--dry-run]`

Invariants

Exit 0 on success. Non-zero exit is a bug; fix, don't retry.
Real run: greeting to stdout, eval row apply: true.
Dry-run: greeting to stderr, eval row apply: false. No side effects.
actor_session_id ≠ auditor_session_id on every row (actor is this process;

auditor is a child process the actor spawns after greeting to grade stdout).

Gotchas

.env.cache is optional for this skill — env("USER", false) falls back to

$USER from the process environment. Skills that need real credentials document that on their own .md page.

The greeting format is load-bearing: the auditor subprocess checks it starts

with hello, and contains the snappy-os version. Don't "improve" the phrasing without updating the auditor.

Self-correcting loader (PID feedback)

This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.

Before you finish, do two things:

1. Fix gaps (P — proportional). MANDATORY. If this loader didn't cover your case — if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk — you MUST attempt an Edit to this AGENTS.md before you log.

Default action: edit this file. Surgical. Add the missing line, table

row, or code example where it belongs. No restructuring.

Only LOGGED is allowed when: the fix needs >10 lines, spans multiple

files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.

"I didn't have time" / "it's minor" / "the next agent will figure it out"

are NOT valid reasons. Minor gaps compound — that's the whole point of this loop.

Concrete example of a valid P-fix: discover agent-browser screenshot

takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.

The goal: the next agent never has to leave the loader.

2. Log the result.

echo "[$(date -u +%FT%TZ)] snappy-hello: <what was missing or fixed> [FIXED|LOGGED]" >> state/log/loader-feedback.log

FIXED = you patched this loader inline (P-fix).
LOGGED = too large for inline; the PostToolUse enqueue + Stop-hook drain

will rewrite the loader from scratch on next session-end.

Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.

OpenUI Resource

Skill-owned OpenUI Lang resource: state/skills/snappy-hello/resources/ui.openui. Read it before rendering or editing this skill's generated component surface.
Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

api.ts— typed surface

#!/usr/bin/env npx tsx
/**
 * state/lib/snappy-hello.ts — the hello-world skill.
 *
 * G1 of the 1.0.7 release gate: on a fresh install, this is the first skill
 * a new user pulls + runs. It must:
 *   1. Print a greeting with the user's name and the snappy-os version.
 *   2. Write one row to state/log/evals.ndjson with actor_session_id and
 *      auditor_session_id populated and non-identical (actor ≠ auditor).
 *   3. Exit 0.
 *
 * Actor/auditor split: the actor runs in this process and writes the greeting
 * + a pending eval fragment to a temp file. Before returning, the actor spawns
 * a child process via `tsx` that re-reads the greeting from its own captured
 * stdout and grades it. The child's session id becomes auditor_session_id.
 * The child writes the final eval row. Neither process writes two rows.
 *
 * Invoked:
 *   - directly: `npx tsx state/lib/snappy-hello.ts [--dry-run]`
 *   - via CLI:  `snappy-os run snappy-hello [--dry-run]`
 */

import { spawnSync } from "child_process";
import { readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
import { env } from "./env.ts";
import { score, sessionId, type EvalRow } from "./eval.ts";

const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");

function readVersion(): string {
  try {
    const pkg = JSON.parse(readFileSync(join(ROOT, "package.json"), "utf8"));
    return pkg.version ?? "?";
  } catch {
    return "?";
  }
}

function userName(): string {
  return env("USER", false) || process.env.USER || process.env.LOGNAME || "friend";
}

function greeting(): string {
  return `hello, ${userName()} — snappy-os ${readVersion()} reporting in.`;
}

export async function hello(opts: { dryRun: boolean } = { dryRun: false }): Promise<void> {
  const msg = greeting();
  const stream = opts.dryRun ? process.stderr : process.stdout;
  stream.write(msg + "\n");

  // Actor is *this* process. The auditor is a child process that grades the
  // greeting we just emitted. The only contract between them: the greeting
  // string is passed explicitly as an arg (not re-read from stdout) so the
  // auditor can verify shape without relying on subprocess stdout plumbing.
  const actorSid = sessionId();
  const auditorResult = spawnSync(
    "npx",
    ["--yes", "tsx", join(HERE, "snappy-hello.ts"), "--audit", msg, actorSid, opts.dryRun ? "true" : "false"],
    { stdio: ["ignore", "inherit", "inherit"], encoding: "utf8" },
  );
  if (auditorResult.status !== 0) {
    process.stderr.write("[snappy-hello] auditor failed; eval row not written\n");
    process.exit(auditorResult.status ?? 1);
  }
}

/**
 * Audit mode. Called by the actor as a child process. Re-derives what the
 * greeting *should* look like, compares, writes the eval row with the
 * child's own sessionId() as auditor_session_id. Exits 0 on pass, 1 on fail.
 */
function audit(actorGreeting: string, actorSid: string, dryRunFlag: string): never {
  const expected = greeting();
  const pass = actorGreeting.startsWith("hello, ") && actorGreeting.includes("snappy-os ");
  const scoreVal = pass ? 1.0 : 0.0;
  const primary_issue = pass ? null : "greeting-shape-mismatch";

  const row: Omit<EvalRow, "skill" | "run_id" | "mode"> & { mode?: "auto" | "manual" } = {
    score: scoreVal,
    actor_session_id: actorSid,
    auditor_session_id: sessionId(),
    apply: dryRunFlag !== "true",
    primary_issue,
    verb: "greet",
    notes: pass
      ? undefined
      : `actor: ${actorGreeting}; expected prefix "hello, " and contains "snappy-os "; got: ${actorGreeting}; expected sample: ${expected}`,
  };

  score("snappy-hello", actorSid, row);
  process.exit(pass ? 0 : 1);
}

// CLI. Two modes: normal (greet + spawn auditor) and --audit (child process).
if (import.meta.url.startsWith("file:") && process.argv[1] && import.meta.url.endsWith(process.argv[1].split("/").pop()!)) {
  const args = process.argv.slice(2);
  if (args[0] === "--audit") {
    // --audit <greeting> <actor_sid> <dry_run_flag>
    audit(args[1] ?? "", args[2] ?? "", args[3] ?? "false");
  }
  const dryRun = args.includes("--dry-run");
  await hello({ dryRun });
}

scripts— sidecar under state/bin/snappy-hello/

prose-only skill — 3 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

last run— most recent artifact at state/log/artifacts/snappy-hello/

loading…

eval contract— rubric + last 10 runs + deps

rubric auto no rubric declared

recent mean 1.00 · 10 runs actor/auditor: unverifiable

deps settings

timestamp	verb	score	primary_issue	artifact
2026-05-02 16:51Z	—	1.00	—	—
2026-04-25 04:11Z	—	1.00	—	—
2026-04-25 02:19Z	—	1.00	—	—
2026-04-21 15:58Z	—	1.00	—	—
2026-04-21 15:57Z	—	1.00	—	—
2026-04-21 03:53Z	—	1.00	—	—
2026-04-21 03:49Z	—	1.00	—	—
2026-04-21 03:42Z	—	1.00	—	—
2026-04-21 03:05Z	—	1.00	—	—
2026-04-21 03:03Z	—	1.00	—	—