drop another .md file to compare - side-by-side diff against agent-recap

agent-recap

Recaps everything your assistant did today in one quick read.

description: "Triggers on prompt mention of 'agent-recap', 'what did the agents do today', 'what ran today', 'today\\'s agent activity', 'agent fleet', 'currently running', 'happening right now', 'summarize today'."

personal 2 files 6 recent evals

Export

What it does for you

Recaps everything your assistant did today in one quick read.

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me

For developers how this skill is built, graded, and how it runs

at a glance- the short version

actorSummarizeRange() in state/lib/agent-recap.ts.

auditorState/lint/library-shape.ts (the lib must export

eval modeshape

what's inside - the parts that make up a skill 3/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill

state/skills/agent-recap/SKILL.md present

the skill itself, in plain text

The main file. It says what the skill is and lays out the steps in plain English.

Code

state/lib/agent-recap.ts present

code the skill can run

Reusable code this skill can call when it needs to.

Scripts

state/bin/agent-recap/ not present

helper scripts

Optional. Added when a skill has a few commands to run.

Loader

state/skills/agent-recap/AGENTS.md present

what the AI loads on the fly

Loaded automatically the moment this skill is needed. Kept short on purpose.

how it's graded - what counts as a good run 3 criteria · 2 deterministic · 1 judge

Each row is one thing a good run has to get right. deterministic means a quick check decides, pass or fail. judge means the AI reads the result and rates it. Grading each piece on its own (instead of one overall score) shows exactly where a run fell short, so the fix is obvious.

name

kind

check

shape_matches_parser

deterministic

summarizeRange() returns { progressList: { steps: [...] } } where each step has id, label, status in {done|running|pending|error}, optional note — exactly the shape parseProgressListArgs in web/src/tool-args.ts accepts.

dispatcher_intent_match

deterministic

Trigger phrases route to a ProgressList TOOL_CALL emission, not text fallback.

classification_correct

judge

Status per skill matches the rule: error if any score=0 in window, running if last_ts within 5min, done otherwise.

how it runs - the shared frame every skill uses 5/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker

present

SummarizeRange() in state/lib/agent-recap.ts. the worker

Does the actual work. Whatever it produces is what gets checked next.

checks the work The reviewer

present

State/lint/library-shape.ts (the lib must export the checker

A separate checker grades the work, so the part that made it can't approve its own work.

frame

learns Self-correction

present

fixes itself learns from gaps

When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.

tidies up Background fixes

present

queued for rewrite runs in the background

Bigger fixes that can't be made on the spot get queued and rewritten in the background later.

remembers Run history

present

state/log/evals.ndjson shape runs

Every run is written down here, so the next time this skill is used it already knows how the last runs went.

Critical rules the things this skill must not get wrong

No must-not-break rules called out for this skill. Anything important lives in the writeup below.

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

Loading feedback rows…

how the work flows- who makes it, who checks it

actor SummarizeRange() in state/lib/agent-recap.ts.

auditor State/lint/library-shape.ts (the lib must export

SKILL.md- the skill, written out in plain English

agent-recap

Producer skill that closes the catalog-to-cockpit drift gap: when the cockpit asks "what did the agents do today", the dispatcher used to fall back to plain text because no skill emitted the right TOOL_CALL_ARGS shape. agent-recap reads state/log/evals.ndjson, groups rows by skill, derives a per-skill status, and shapes the result into the ProgressList contract that snappy-chat/web/src/tool-args.ts:parseProgressListArgs accepts.

The lib at state/lib/agent-recap.ts is the action scope. The dispatcher at state/bin/head-screen/server.ts wires the intent matcher and emits the TOOL_CALL_START / TOOL_CALL_ARGS / TOOL_CALL_END triple after the assistant text streams.

Pure read. Never writes. Scope-only by snappy-os convention.

Steps

summarizeToday() - wrapper for summarizeRange({}).
summarizeRange({ since?, until?, nowMs?, evalsPath? }):

Default range is [startOfTodayUtcMs(now), now).
Read state/log/evals.ndjson line-by-line, parse JSON, drop rows

missing ts or skill or with timestamps outside the range.

Bucket by skill. Track runs, sum/count of score, minScore,

maxTs.

Per skill, status is:
error if minScore <= 0 (any zero-score row in window)
running if now - maxTs <= 5 * 60_000 (last 5 minutes)
done otherwise
Sort skills alphabetically. Emit `{ id: "recap-<n>", label: <slug>,

status, note: "<runs> runs, mean <score>" }`.

currentStepId is the first running step, if any.

Dispatcher wiring

state/bin/head-screen/server.ts adds a regex matcher to the chat dispatch intent loop (alongside the existing ProgressList / ContextPanel / AgentDetail regex emitters). Trigger phrases: "what did the agents do today", "what ran today", "today's agent activity", "summarize today".

When matched and llmEmittedNames does not already contain ProgressList, the dispatcher imports summarizeToday() from this lib and calls emitTriple("ProgressList", payload.progressList).

Eval

Actor: summarizeRange() in state/lib/agent-recap.ts. Auditor: state/lint/library-shape.ts (the lib must export summarizeToday and summarizeRange) plus the dogfood loop screenshot that proves the card rendered (not text fallback).

Outcome	Score
Card renders in chat with one step per skill, status correct	1.0
Card renders but classification is wrong (e.g. running missing)	0.5
Plain text fallback (intent didn't match)	0.0

Rubric

criteria:
  - name: shape_matches_parser
    kind: deterministic
    check: "summarizeRange() returns { progressList: { steps: [...] } } where each step has id, label, status in {done|running|pending|error}, optional note — exactly the shape parseProgressListArgs in web/src/tool-args.ts accepts."
  - name: dispatcher_intent_match
    kind: deterministic
    check: "Trigger phrases route to a ProgressList TOOL_CALL emission, not text fallback."
  - name: classification_correct
    kind: judge
    check: "Status per skill matches the rule: error if any score=0 in window, running if last_ts within 5min, done otherwise."

AGENTS.md- what the AI loads when this skill comes up

agent-recap - loader

Per-turn rules. Full reference: state/skills/agent-recap/SKILL.md. Producer skill that reads state/log/evals.ndjson, groups by skill, derives status, and emits a ProgressList payload that snappy-chat renders inline. Lib: state/lib/agent-recap.ts. Dispatcher: state/bin/head-screen/server.ts (regex + emitTriple("ProgressList", payload.progressList)).

Per-skill genui pilot: state/skills/agent-recap/ui.tsx was created 2026-04-29T21:43:02Z as the pilot for per-skill co-located genui (CONSTITUTION invariant #5). emits_cards: true is set in SKILL.md frontmatter. This is the reference implementation for the co-location pattern.

Critical Rules

Pure read. NEVER write inside summarizeRange. Drop corrupt rows silently. The function is a sensor, not an actor.
Status enum is exactly error|running|done|pending. parsePhaseDisclosureArgs validates this - off-enum status kills card render entirely.
No double-emit. Dispatcher must NOT re-emit ProgressList if llmEmittedNames already contains it - duplicate cards in chat.
Sort alphabetically by skill - output is pre-sorted; do not resort downstream. Deterministic position across reloads.
UTC start-of-day boundary. Default range: [startOfTodayUtcMs(now), now). A skill that ran at 23:55 UTC yesterday does NOT appear in today's recap.
Dispatcher intent guard. "RUN THIS" drain briefs for agent-recap are valid - the lib is runnable. Call npx tsx state/lib/agent-recap.ts to produce an eval row. (This is a producer skill, not auto-shape.)

Status logic

status	condition
error	`minScore <= 0` (any zero-score row in window)
running	`now - maxTs <= 5 * 60_000` (last 5 min)
done	otherwise

Step shape: { id: "recap-<n>", label: <slug>, status, note: "<runs> runs, mean <score>" }. currentStepId = first running step if any.

Commands

| ui dashboard | state/skills/agent-recap/resources/ui.openui |

step	command
invoke	`import { summarizeToday, summarizeRange } from "../lib/agent-recap.ts"`
cli	`npx tsx state/lib/agent-recap.ts` → JSON to stdout; eval row lands in `evals.ndjson`
dispatcher wiring	`state/bin/head-screen/server.ts` regex + `emitTriple("ProgressList", ...)`
trigger phrases	`"what did the agents do today"` / `"what ran today"` / `"today's agent activity"` / `"summarize today"` / `"agent fleet"` / `"currently running"` / `"happening right now"`
reads	`state/log/evals.ndjson`
eval log	`state/log/evals.ndjson` (skill: `"agent-recap"`, eval_mode: `shape`)
writeback	`state/log/loader-feedback.log`
ui pilot	`state/skills/agent-recap/ui.tsx` (per-skill co-located genui pilot)

OpenUI Resource

Skill-owned OpenUI Lang resource: state/skills/agent-recap/resources/ui.openui. Read it before rendering or editing this skill's generated component surface.
Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

OpenUI Resource

Skill-owned OpenUI Lang resource: state/skills/agent-recap/resources/today.openui. Read it before rendering or editing this skill's generated component surface.
Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
This named surface must stay discoverable through its top comment metadata: // surface, // intents, and // response.
System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

Known Pitfalls

Corrupt eval row drops silently. Missing skills in recap → read source log directly.
5-minute "running" window can mark one-shot skills as running right after their eval row lands. Corrects within 5 minutes.
status: "error" only fires on score = 0. Score 0.5 → done with low-score note, not error.
agent-recap enrichment regex in server.ts handles "agent fleet" (commit ea108f2) and "currently running" (recapRegexEnrich at L6493). Both paths must remain wired.

Self-Test

An agent reading this should correctly:

[ ] Confirm summarizeRange() returns statuses only in error|running|done|pending
[ ] Trace trigger phrase through dispatcher (regex match, not text fallback)
[ ] Apply UTC start-of-day boundary for default "today" range
[ ] Verify summarizeRange performs zero writes - pure read of evals.ndjson
[ ] Confirm single ProgressList emit per turn via llmEmittedNames guard
[ ] Know state/skills/agent-recap/ui.tsx is the per-skill co-located genui pilot (emits_cards: true)

Self-correcting loader (PID feedback)

This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.

Before you finish, do two things:

1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case - if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk - you MUST attempt an Edit to this AGENTS.md before you log.

Default action: edit this file. Surgical. Add the missing line, table

row, or code example where it belongs. No restructuring.

Only LOGGED is allowed when: the fix needs >10 lines, spans multiple

files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.

"I didn't have time" / "it's minor" / "the next agent will figure it out"

are NOT valid reasons. Minor gaps compound - that's the whole point of this loop.

Concrete example of a valid P-fix: discover agent-browser screenshot

takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.

The goal: the next agent never has to leave the loader.

2. Log the result.

echo "[$(date -u +%FT%TZ)] agent-recap: <what was missing or fixed> [FIXED|LOGGED] action_kind=<kind>" >> state/log/loader-feedback.log

FIXED = you patched this loader inline (P-fix).
LOGGED = too large for inline; the PostToolUse enqueue + Stop-hook drain

will rewrite the loader from scratch on next session-end.

action_kind: shape-ok | skill-ran | loader-rewritten | pattern-elevated

Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.

api.ts- the code it can call

#!/usr/bin/env npx tsx
/**
 * state/lib/agent-recap.ts -- read state/log/evals.ndjson and produce a
 * ProgressList payload for the snappy-chat generative UI.
 *
 * Closes the catalog→cockpit drift gap surfaced by the dogfood loop: the
 * intent "what did the agents do today" used to fall back to plain text
 * because no producer skill emitted the right TOOL_CALL shape. This lib
 * groups eval rows by skill, derives a status per skill (running if a row
 * landed in the last 5 minutes, error if any score = 0, done otherwise),
 * and shapes them into the ProgressList contract that
 * web/src/tool-args.ts:parseProgressListArgs accepts.
 *
 *   import { summarizeToday, summarizeRange } from "../lib/agent-recap.ts";
 *   const { progressList } = await summarizeToday();
 *
 * Pure read. Never writes. Scope-only by snappy-os convention.
 */

import { existsSync, readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";

const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");
const EVALS_PATH = join(ROOT, "state", "log", "evals.ndjson");

const RUNNING_WINDOW_MS = 5 * 60 * 1000;

export type ProgressListStatus = "done" | "running" | "pending" | "error";

export interface ProgressListStep {
  id: string;
  label: string;
  status: ProgressListStatus;
  note?: string;
}

export interface ProgressListPayload {
  steps: ProgressListStep[];
  currentStepId?: string;
}

export interface SummarizeRangeOpts {
  /** ISO timestamp lower bound (inclusive). Default = today 00:00 UTC. */
  since?: string;
  /** ISO timestamp upper bound (exclusive). Default = now. */
  until?: string;
  /** "now" override for tests — ms since epoch. */
  nowMs?: number;
  /** Override evals.ndjson path (mostly for tests). */
  evalsPath?: string;
}

interface EvalRow {
  ts?: string;
  skill?: string;
  score?: number;
  primary_issue?: string | null;
}

function parseRow(line: string): EvalRow | null {
  if (!line) return null;
  try {
    const o = JSON.parse(line);
    if (!o || typeof o !== "object") return null;
    return o as EvalRow;
  } catch {
    return null;
  }
}

function startOfTodayUtcMs(nowMs: number): number {
  const d = new Date(nowMs);
  return Date.UTC(
    d.getUTCFullYear(),
    d.getUTCMonth(),
    d.getUTCDate(),
    0, 0, 0, 0,
  );
}

function classifyStatus(args: {
  hasZero: boolean;
  hasRecent: boolean;
}): ProgressListStatus {
  if (args.hasZero) return "error";
  if (args.hasRecent) return "running";
  return "done";
}

/**
 * Read evals.ndjson and return a ProgressList payload covering one row per
 * skill that ran in [since, until). Steps are sorted alphabetically by
 * skill so the UI is stable across runs.
 */
export async function summarizeRange(
  opts: SummarizeRangeOpts = {},
): Promise<{ progressList: ProgressListPayload }> {
  const nowMs = opts.nowMs ?? Date.now();
  const sinceMs = opts.since
    ? Date.parse(opts.since)
    : startOfTodayUtcMs(nowMs);
  const untilMs = opts.until ? Date.parse(opts.until) : nowMs;
  const path = opts.evalsPath ?? EVALS_PATH;

  if (!existsSync(path)) {
    return { progressList: { steps: [] } };
  }

  const lines = readFileSync(path, "utf8").split(/\r?\n/);

  type Bucket = {
    runs: number;
    sumScore: number;
    countScored: number;
    minScore: number;
    maxTs: number;
  };
  const bySkill = new Map<string, Bucket>();

  for (const line of lines) {
    const row = parseRow(line);
    if (!row) continue;
    if (typeof row.skill !== "string" || row.skill.length === 0) continue;
    if (typeof row.ts !== "string") continue;
    const tsMs = Date.parse(row.ts);
    if (!Number.isFinite(tsMs)) continue;
    if (tsMs < sinceMs || tsMs >= untilMs) continue;

    const cur = bySkill.get(row.skill) ?? {
      runs: 0,
      sumScore: 0,
      countScored: 0,
      minScore: Number.POSITIVE_INFINITY,
      maxTs: 0,
    };
    cur.runs += 1;
    if (typeof row.score === "number" && Number.isFinite(row.score)) {
      cur.sumScore += row.score;
      cur.countScored += 1;
      if (row.score < cur.minScore) cur.minScore = row.score;
    }
    if (tsMs > cur.maxTs) cur.maxTs = tsMs;
    bySkill.set(row.skill, cur);
  }

  const slugs = [...bySkill.keys()].sort((a, b) => a.localeCompare(b));
  const steps: ProgressListStep[] = slugs.map((slug, i) => {
    const b = bySkill.get(slug)!;
    const hasRecent = nowMs - b.maxTs <= RUNNING_WINDOW_MS;
    const hasZero =
      b.countScored > 0 && b.minScore !== Number.POSITIVE_INFINITY && b.minScore <= 0;
    const status = classifyStatus({ hasZero, hasRecent });
    const meanScore =
      b.countScored > 0 ? b.sumScore / b.countScored : null;
    const noteParts: string[] = [`${b.runs} run${b.runs === 1 ? "" : "s"}`];
    if (meanScore !== null) noteParts.push(`mean ${meanScore.toFixed(2)}`);
    return {
      id: `recap-${i + 1}`,
      label: slug,
      status,
      note: noteParts.join(" · "),
    };
  });

  // Surface "currently active" — the most recent running step, if any.
  const runningStep = steps.find(s => s.status === "running");
  const payload: ProgressListPayload = { steps };
  if (runningStep) payload.currentStepId = runningStep.id;
  return { progressList: payload };
}

/** Convenience wrapper — today (UTC) up to now. */
export async function summarizeToday(): Promise<{ progressList: ProgressListPayload }> {
  return summarizeRange({});
}

/**
 * Same scan as summarizeRange, but emits a PhaseDisclosure payload via the
 * phase-trace builder. One disclosure row per skill; the collapsed summary
 * is the run-count + mean score, the expanded body lists individual run
 * timestamps + their primary issues. Used by the chat dispatcher to answer
 * "what is X doing right now" / "trace the agent run" intents.
 */
export async function summarizeRangeAsPhases(
  opts: SummarizeRangeOpts = {},
): Promise<{ phaseDisclosure: { phases: Array<{ label: string; summary: string; body?: string; status?: ProgressListStatus }>; title?: string } }> {
  const { startTrace } = await import("./phase-trace.ts");
  const nowMs = opts.nowMs ?? Date.now();
  const sinceMs = opts.since
    ? Date.parse(opts.since)
    : startOfTodayUtcMs(nowMs);
  const untilMs = opts.until ? Date.parse(opts.until) : nowMs;
  const path = opts.evalsPath ?? EVALS_PATH;

  const trace = startTrace("Agent activity today");

  if (!existsSync(path)) {
    return trace.emit();
  }

  const lines = readFileSync(path, "utf8").split(/\r?\n/);

  type Bucket = {
    runs: number;
    sumScore: number;
    countScored: number;
    minScore: number;
    maxTs: number;
    issues: Array<{ ts: string; score?: number; issue?: string | null }>;
  };
  const bySkill = new Map<string, Bucket>();

  for (const line of lines) {
    const row = parseRow(line);
    if (!row) continue;
    if (typeof row.skill !== "string" || row.skill.length === 0) continue;
    if (typeof row.ts !== "string") continue;
    const tsMs = Date.parse(row.ts);
    if (!Number.isFinite(tsMs)) continue;
    if (tsMs < sinceMs || tsMs >= untilMs) continue;

    const cur = bySkill.get(row.skill) ?? {
      runs: 0, sumScore: 0, countScored: 0,
      minScore: Number.POSITIVE_INFINITY, maxTs: 0, issues: []
    };
    cur.runs += 1;
    if (typeof row.score === "number" && Number.isFinite(row.score)) {
      cur.sumScore += row.score;
      cur.countScored += 1;
      if (row.score < cur.minScore) cur.minScore = row.score;
    }
    if (tsMs > cur.maxTs) cur.maxTs = tsMs;
    cur.issues.push({ ts: row.ts, score: row.score, issue: row.primary_issue ?? null });
    bySkill.set(row.skill, cur);
  }

  const slugs = [...bySkill.keys()].sort((a, b) => a.localeCompare(b));
  for (const slug of slugs) {
    const b = bySkill.get(slug)!;
    const hasRecent = nowMs - b.maxTs <= RUNNING_WINDOW_MS;
    const hasZero =
      b.countScored > 0 && b.minScore !== Number.POSITIVE_INFINITY && b.minScore <= 0;
    const status: ProgressListStatus = classifyStatus({ hasZero, hasRecent });
    const meanScore = b.countScored > 0 ? b.sumScore / b.countScored : null;
    const summary = meanScore !== null
      ? `${b.runs} run${b.runs === 1 ? "" : "s"} · mean ${meanScore.toFixed(2)}`
      : `${b.runs} run${b.runs === 1 ? "" : "s"}`;
    const lastN = b.issues.slice(-5).reverse();
    const bodyLines = lastN.map(r => {
      const score = typeof r.score === "number" ? r.score.toFixed(2) : "—";
      const issue = r.issue ? ` ${r.issue}` : "";
      return `${r.ts} · score ${score}${issue}`;
    });
    trace.phase(slug, { summary, status }).body(bodyLines.join("\n"));
  }

  return trace.emit();
}

/** Convenience: today's activity as a PhaseDisclosure payload. */
export async function summarizeTodayAsPhases() {
  return summarizeRangeAsPhases({});
}

// CLI smoke-test path: `npx tsx state/lib/agent-recap.ts`
const isMain = (() => {
  try {
    const argv1 = process.argv[1];
    if (!argv1) return false;
    return import.meta.url === new URL(`file://${argv1}`).href
      || import.meta.url.endsWith(argv1);
  } catch { return false; }
})();
if (isMain) {
  summarizeToday().then(out => {
    process.stdout.write(JSON.stringify(out, null, 2) + "\n");
  }).catch(err => {
    console.error(`agent-recap: ${err}`);
    process.exit(1);
  });
}

scripts- helper scripts it can run

prose-only skill - 1 inline code block live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 6 runs

rubric shape schema-shape check (no inline rubric)

recent mean 1.00 · 6 runs actor/auditor: unverifiable

deps none declared

timestamp	verb	score	primary_issue	artifact
2026-04-30 04:36Z	-	1.00	-	-
2026-04-29 04:31Z	-	1.00	-	-
2026-04-30 04:36Z	-	1.00	-	-
2026-04-29 04:31Z	-	1.00	-	-
2026-04-30 04:36Z	-	1.00	-	-
2026-04-29 04:31Z	-	1.00	-	-