.md file to compare - side-by-side diff against snappy-hello
snappy-hello
What it does for you
A friendly first run that proves your assistant is set up and working.
What it produces
A recent result, so you can see the kind of work it returns.
loading…
How to get it
These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.
For developers how this skill is built, graded, and how it runs
at a glance- the short version
what's inside - the parts that make up a skill 3/4 present
A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.
state/skills/snappy-hello/SKILL.md
present
state/lib/snappy-hello.ts
present
state/bin/snappy-hello/
not present
state/skills/snappy-hello/AGENTS.md
present
how it's graded - what counts as a good run 5 criteria · 5 deterministic
Each row is one thing a good run has to get right. deterministic means a quick check decides, pass or fail. judge means the AI reads the result and rates it. Grading each piece on its own (instead of one overall score) shows exactly where a run fell short, so the fix is obvious.
how it runs - the shared frame every skill uses 3/5 present
Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.
No separate check found. Without one, the part that makes the work could end up approving its own work, worth a closer look.
state/log/evals.ndjson what it has learned - fixes written back in over time sample
When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.
- Loading feedback rows…
how the work flows- step by step
snappy-os run snappy-hello # real run; writes one row to evals.ndjson
SKILL.md- the skill, written out in plain English
snappy-hello
The hello-world skill. Read this to see the minimum shape of a snappy-os skill before you write your own. Every section below is the minimum; yours can grow.
When to use
- First run after
npx snappy-os init. Confirms the loop works end-to-end. - Smoke test for new installs - one round-trip through dispatch → eval → log.
Commands
snappy-os run snappy-hello # real run; writes one row to evals.ndjson
snappy-os run snappy-hello --dry-run # scope-only; greeting → stderr, apply:false
Self-test
snappy-os run snappy-hello
tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id'
# → true
Eval
Actor prints the greeting to stdout and writes a pending row with actor_session_id set. An auditor child-process re-reads stdout, grades the greeting shape, writes auditor_session_id back onto the same row. Actor ≠ auditor holds because the auditor runs in a separate subprocess.
Rubric
criteria:
- name: greeting_sent_to_stdout
kind: deterministic
check: "The command 'snappy-os run snappy-hello' prints a greeting to stdout."
- name: eval_row_written
kind: deterministic
check: "A new row is appended to 'state/log/evals.ndjson' after 'snappy-os run snappy-hello' completes."
- name: actor_id_in_eval_row
kind: deterministic
check: "The new eval row in 'state/log/evals.ndjson' contains an 'actor_session_id' field."
- name: auditor_updates_eval_row
kind: deterministic
check: "The same eval row is subsequently updated to include an 'auditor_session_id' field."
- name: actor_auditor_ids_differ
kind: deterministic
check: "The 'actor_session_id' and 'auditor_session_id' in the eval row are different, as verified by 'tail -1 state/log/evals.ndjson | jq '.actor_session_id != .auditor_session_id''."AGENTS.md- what the AI loads when this skill comes up
snappy-hello - per-turn loader
UI Resources
state/skills/snappy-hello/resources/ui.openui
The hello-world skill. Proves the loop works end-to-end on a fresh install.
Command index
| Action | Command |
|---|---|
| greet (real) | npx tsx state/lib/snappy-hello.ts |
| greet (scope-only) | npx tsx state/lib/snappy-hello.ts --dry-run |
| via CLI | snappy-os run snappy-hello [--dry-run] |
Invariants
- Exit 0 on success. Non-zero exit is a bug; fix, don't retry.
- Real run: greeting to stdout, eval row
apply: true. - Dry-run: greeting to stderr, eval row
apply: false. No side effects. actor_session_id ≠ auditor_session_idon every row (actor is this process;
auditor is a child process the actor spawns after greeting to grade stdout).
Gotchas
.env.cacheis optional for this skill -env("USER", false)falls back to
$USER from the process environment. Skills that need real credentials document that on their own .md page.
- The greeting format is load-bearing: the auditor subprocess checks it starts
with hello, and contains the snappy-os version. Don't "improve" the phrasing without updating the auditor.
<!-- kernel-ok: historical attribution for the PID-loader pattern. -->
Self-correcting loader (PID feedback)
This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.
Before you finish, do two things:
1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case - if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk - you MUST attempt an Edit to this AGENTS.md before you log.
- Default action: edit this file. Surgical. Add the missing line, table
row, or code example where it belongs. No restructuring.
- Only
LOGGEDis allowed when: the fix needs >10 lines, spans multiple
files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.
- "I didn't have time" / "it's minor" / "the next agent will figure it out"
are NOT valid reasons. Minor gaps compound - that's the whole point of this loop.
- Concrete example of a valid P-fix: discover
agent-browser screenshot
takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.
- The goal: the next agent never has to leave the loader.
2. Log the result.
echo "[$(date -u +%FT%TZ)] snappy-hello: <what was missing or fixed> [FIXED|LOGGED]" >> state/log/loader-feedback.log
FIXED= you patched this loader inline (P-fix).LOGGED= too large for inline; the PostToolUse enqueue + Stop-hook drain
will rewrite the loader from scratch on next session-end.
Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.
OpenUI Resource
- Skill-owned OpenUI Lang resource:
state/skills/snappy-hello/resources/ui.openui. Read it before rendering or editing this skill's generated component surface. - Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
- System resources compose OpenUI primitives and inherit SnappyChat tokens. Use
ui_contract: brandedin SKILL.md only for deliberate platform or client visuals.
api.ts- the code it can call
#!/usr/bin/env npx tsx
/**
* state/lib/snappy-hello.ts — the hello-world skill.
*
* G1 of the 1.0.7 release gate: on a fresh install, this is the first skill
* a new user pulls + runs. It must:
* 1. Print a greeting with the user's name and the snappy-os version.
* 2. Write one row to state/log/evals.ndjson with actor_session_id and
* auditor_session_id populated and non-identical (actor ≠ auditor).
* 3. Exit 0.
*
* Actor/auditor split: the actor runs in this process and writes the greeting
* + a pending eval fragment to a temp file. Before returning, the actor spawns
* a child process via `tsx` that re-reads the greeting from its own captured
* stdout and grades it. The child's session id becomes auditor_session_id.
* The child writes the final eval row. Neither process writes two rows.
*
* Invoked:
* - directly: `npx tsx state/lib/snappy-hello.ts [--dry-run]`
* - via CLI: `snappy-os run snappy-hello [--dry-run]`
*/
import { spawnSync } from "child_process";
import { readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
import { env } from "./env.ts";
import { score, sessionId, type EvalRow } from "./eval.ts";
const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");
function readVersion(): string {
try {
const pkg = JSON.parse(readFileSync(join(ROOT, "package.json"), "utf8"));
return pkg.version ?? "?";
} catch {
return "?";
}
}
function userName(): string {
return env("USER", false) || process.env.USER || process.env.LOGNAME || "friend";
}
function greeting(): string {
return `hello, ${userName()} — snappy-os ${readVersion()} reporting in.`;
}
export async function hello(opts: { dryRun: boolean } = { dryRun: false }): Promise<void> {
const msg = greeting();
const stream = opts.dryRun ? process.stderr : process.stdout;
stream.write(msg + "\n");
// Actor is *this* process. The auditor is a child process that grades the
// greeting we just emitted. The only contract between them: the greeting
// string is passed explicitly as an arg (not re-read from stdout) so the
// auditor can verify shape without relying on subprocess stdout plumbing.
const actorSid = sessionId();
const auditorResult = spawnSync(
"npx",
["--yes", "tsx", join(HERE, "snappy-hello.ts"), "--audit", msg, actorSid, opts.dryRun ? "true" : "false"],
{ stdio: ["ignore", "inherit", "inherit"], encoding: "utf8" },
);
if (auditorResult.status !== 0) {
process.stderr.write("[snappy-hello] auditor failed; eval row not written\n");
process.exit(auditorResult.status ?? 1);
}
}
/**
* Audit mode. Called by the actor as a child process. Re-derives what the
* greeting *should* look like, compares, writes the eval row with the
* child's own sessionId() as auditor_session_id. Exits 0 on pass, 1 on fail.
*/
function audit(actorGreeting: string, actorSid: string, dryRunFlag: string): never {
const expected = greeting();
const pass = actorGreeting.startsWith("hello, ") && actorGreeting.includes("snappy-os ");
const scoreVal = pass ? 1.0 : 0.0;
const primary_issue = pass ? null : "greeting-shape-mismatch";
const row: Omit<EvalRow, "skill" | "run_id" | "mode"> & { mode?: "auto" | "manual" } = {
score: scoreVal,
actor_session_id: actorSid,
auditor_session_id: sessionId(),
apply: dryRunFlag !== "true",
primary_issue,
verb: "greet",
notes: pass
? undefined
: `actor: ${actorGreeting}; expected prefix "hello, " and contains "snappy-os "; got: ${actorGreeting}; expected sample: ${expected}`,
};
score("snappy-hello", actorSid, row);
process.exit(pass ? 0 : 1);
}
// CLI. Two modes: normal (greet + spawn auditor) and --audit (child process).
if (import.meta.url.startsWith("file:") && process.argv[1] && import.meta.url.endsWith(process.argv[1].split("/").pop()!)) {
const args = process.argv.slice(2);
if (args[0] === "--audit") {
// --audit <greeting> <actor_sid> <dry_run_flag>
audit(args[1] ?? "", args[2] ?? "", args[3] ?? "false");
}
const dryRun = args.includes("--dry-run");
await hello({ dryRun });
}
scripts- helper scripts it can run
prose-only skill - 3 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).
how we check it- the checks, plus the last 10 runs
| timestamp | verb | score | primary_issue | artifact |
|---|---|---|---|---|
| 2026-05-02 16:51Z | - | 1.00 | - | - |
| 2026-04-25 04:11Z | - | 1.00 | - | - |
| 2026-04-25 02:19Z | - | 1.00 | - | - |
| 2026-04-21 15:58Z | - | 1.00 | - | - |
| 2026-04-21 15:57Z | - | 1.00 | - | - |
| 2026-04-21 03:53Z | - | 1.00 | - | - |
| 2026-04-21 03:49Z | - | 1.00 | - | - |
| 2026-04-21 03:42Z | - | 1.00 | - | - |
| 2026-04-21 03:05Z | - | 1.00 | - | - |
| 2026-04-21 03:03Z | - | 1.00 | - | - |