SummarizeRange() in state/lib/agent-recap.ts. .md file to compare - side-by-side diff against agent-recap
agent-recap
description: "Triggers on prompt mention of 'agent-recap', 'what did the agents do today', 'what ran today', 'today\\'s agent activity', 'agent fleet', 'currently running', 'happening right now', 'summarize today'."
What it does for you
Recaps everything your assistant did today in one quick read.
What it produces
A recent result, so you can see the kind of work it returns.
loading…
How to get it
These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.
For developers how this skill is built, graded, and how it runs
at a glance- the short version
what's inside - the parts that make up a skill 3/4 present
A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.
state/skills/agent-recap/SKILL.md
present
state/lib/agent-recap.ts
present
state/bin/agent-recap/
not present
state/skills/agent-recap/AGENTS.md
present
how it's graded - what counts as a good run 3 criteria · 2 deterministic · 1 judge
Each row is one thing a good run has to get right. deterministic means a quick check decides, pass or fail. judge means the AI reads the result and rates it. Grading each piece on its own (instead of one overall score) shows exactly where a run fell short, so the fix is obvious.
how it runs - the shared frame every skill uses 5/5 present
Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.
State/lint/library-shape.ts (the lib must export state/log/evals.ndjson what it has learned - fixes written back in over time sample
When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.
- Loading feedback rows…
how the work flows- who makes it, who checks it
SKILL.md- the skill, written out in plain English
agent-recap
Producer skill that closes the catalog-to-cockpit drift gap: when the cockpit asks "what did the agents do today", the dispatcher used to fall back to plain text because no skill emitted the right TOOL_CALL_ARGS shape. agent-recap reads state/log/evals.ndjson, groups rows by skill, derives a per-skill status, and shapes the result into the ProgressList contract that snappy-chat/web/src/tool-args.ts:parseProgressListArgs accepts.
The lib at state/lib/agent-recap.ts is the action scope. The dispatcher at state/bin/head-screen/server.ts wires the intent matcher and emits the TOOL_CALL_START / TOOL_CALL_ARGS / TOOL_CALL_END triple after the assistant text streams.
Pure read. Never writes. Scope-only by snappy-os convention.
Steps
summarizeToday()- wrapper forsummarizeRange({}).summarizeRange({ since?, until?, nowMs?, evalsPath? }):
- Default range is
[startOfTodayUtcMs(now), now). - Read
state/log/evals.ndjsonline-by-line, parse JSON, drop rows
missing ts or skill or with timestamps outside the range.
- Bucket by
skill. Trackruns, sum/count ofscore,minScore,
maxTs.
- Per skill, status is:
errorifminScore <= 0(any zero-score row in window)runningifnow - maxTs <= 5 * 60_000(last 5 minutes)doneotherwise- Sort skills alphabetically. Emit `{ id: "recap-<n>", label: <slug>,
status, note: "<runs> runs, mean <score>" }`.
currentStepIdis the first running step, if any.
Dispatcher wiring
state/bin/head-screen/server.ts adds a regex matcher to the chat dispatch intent loop (alongside the existing ProgressList / ContextPanel / AgentDetail regex emitters). Trigger phrases: "what did the agents do today", "what ran today", "today's agent activity", "summarize today".
When matched and llmEmittedNames does not already contain ProgressList, the dispatcher imports summarizeToday() from this lib and calls emitTriple("ProgressList", payload.progressList).
Eval
Actor: summarizeRange() in state/lib/agent-recap.ts. Auditor: state/lint/library-shape.ts (the lib must export summarizeToday and summarizeRange) plus the dogfood loop screenshot that proves the card rendered (not text fallback).
| Outcome | Score |
|---|---|
| Card renders in chat with one step per skill, status correct | 1.0 |
| Card renders but classification is wrong (e.g. running missing) | 0.5 |
| Plain text fallback (intent didn't match) | 0.0 |
Rubric
criteria:
- name: shape_matches_parser
kind: deterministic
check: "summarizeRange() returns { progressList: { steps: [...] } } where each step has id, label, status in {done|running|pending|error}, optional note — exactly the shape parseProgressListArgs in web/src/tool-args.ts accepts."
- name: dispatcher_intent_match
kind: deterministic
check: "Trigger phrases route to a ProgressList TOOL_CALL emission, not text fallback."
- name: classification_correct
kind: judge
check: "Status per skill matches the rule: error if any score=0 in window, running if last_ts within 5min, done otherwise."AGENTS.md- what the AI loads when this skill comes up
agent-recap - loader
Per-turn rules. Full reference: state/skills/agent-recap/SKILL.md. Producer skill that reads state/log/evals.ndjson, groups by skill, derives status, and emits a ProgressList payload that snappy-chat renders inline. Lib: state/lib/agent-recap.ts. Dispatcher: state/bin/head-screen/server.ts (regex + emitTriple("ProgressList", payload.progressList)).
Per-skill genui pilot: state/skills/agent-recap/ui.tsx was created 2026-04-29T21:43:02Z as the pilot for per-skill co-located genui (CONSTITUTION invariant #5). emits_cards: true is set in SKILL.md frontmatter. This is the reference implementation for the co-location pattern.
Critical Rules
- Pure read. NEVER write inside
summarizeRange. Drop corrupt rows silently. The function is a sensor, not an actor. - Status enum is exactly
error|running|done|pending.parsePhaseDisclosureArgsvalidates this - off-enum status kills card render entirely. - No double-emit. Dispatcher must NOT re-emit
ProgressListifllmEmittedNamesalready contains it - duplicate cards in chat. - Sort alphabetically by skill - output is pre-sorted; do not resort downstream. Deterministic position across reloads.
- UTC start-of-day boundary. Default range:
[startOfTodayUtcMs(now), now). A skill that ran at 23:55 UTC yesterday does NOT appear in today's recap. - Dispatcher intent guard. "RUN THIS" drain briefs for
agent-recapare valid - the lib is runnable. Callnpx tsx state/lib/agent-recap.tsto produce an eval row. (This is a producer skill, not auto-shape.)
Status logic
| status | condition |
|---|---|
| error | minScore <= 0 (any zero-score row in window) |
| running | now - maxTs <= 5 * 60_000 (last 5 min) |
| done | otherwise |
Step shape: { id: "recap-<n>", label: <slug>, status, note: "<runs> runs, mean <score>" }. currentStepId = first running step if any.
Commands
| ui dashboard | state/skills/agent-recap/resources/ui.openui |
| step | command |
|---|---|
| invoke | import { summarizeToday, summarizeRange } from "../lib/agent-recap.ts" |
| cli | npx tsx state/lib/agent-recap.ts → JSON to stdout; eval row lands in evals.ndjson |
| dispatcher wiring | state/bin/head-screen/server.ts regex + emitTriple("ProgressList", ...) |
| trigger phrases | "what did the agents do today" / "what ran today" / "today's agent activity" / "summarize today" / "agent fleet" / "currently running" / "happening right now" |
| reads | state/log/evals.ndjson |
| eval log | state/log/evals.ndjson (skill: "agent-recap", eval_mode: shape) |
| writeback | state/log/loader-feedback.log |
| ui pilot | state/skills/agent-recap/ui.tsx (per-skill co-located genui pilot) |
OpenUI Resource
- Skill-owned OpenUI Lang resource:
state/skills/agent-recap/resources/ui.openui. Read it before rendering or editing this skill's generated component surface. - Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
- System resources compose OpenUI primitives and inherit SnappyChat tokens. Use
ui_contract: brandedin SKILL.md only for deliberate platform or client visuals.
OpenUI Resource
- Skill-owned OpenUI Lang resource:
state/skills/agent-recap/resources/today.openui. Read it before rendering or editing this skill's generated component surface. - Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
- This named surface must stay discoverable through its top comment metadata:
// surface,// intents, and// response. - System resources compose OpenUI primitives and inherit SnappyChat tokens. Use
ui_contract: brandedin SKILL.md only for deliberate platform or client visuals.
Known Pitfalls
- Corrupt eval row drops silently. Missing skills in recap → read source log directly.
- 5-minute "running" window can mark one-shot skills as running right after their eval row lands. Corrects within 5 minutes.
status: "error"only fires onscore = 0. Score 0.5 →donewith low-score note, noterror.agent-recapenrichment regex in server.ts handles"agent fleet"(commitea108f2) and"currently running"(recapRegexEnrichat L6493). Both paths must remain wired.
Self-Test
An agent reading this should correctly:
- [ ] Confirm
summarizeRange()returns statuses only inerror|running|done|pending - [ ] Trace trigger phrase through dispatcher (regex match, not text fallback)
- [ ] Apply UTC start-of-day boundary for default "today" range
- [ ] Verify
summarizeRangeperforms zero writes - pure read ofevals.ndjson - [ ] Confirm single
ProgressListemit per turn viallmEmittedNamesguard - [ ] Know
state/skills/agent-recap/ui.tsxis the per-skill co-located genui pilot (emits_cards: true)
<!-- kernel-ok: historical attribution for the PID-loader pattern. -->
Self-correcting loader (PID feedback)
This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.
Before you finish, do two things:
1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case - if you had to read another file, run an undocumented command, work around a wrong selector, or discover a quirk - you MUST attempt an Edit to this AGENTS.md before you log.
- Default action: edit this file. Surgical. Add the missing line, table
row, or code example where it belongs. No restructuring.
- Only
LOGGEDis allowed when: the fix needs >10 lines, spans multiple
files, or requires a structural rewrite. In that case the state/regen/drain.sh queue picks it up asynchronously.
- "I didn't have time" / "it's minor" / "the next agent will figure it out"
are NOT valid reasons. Minor gaps compound - that's the whole point of this loop.
- Concrete example of a valid P-fix: discover
agent-browser screenshot
takes path positionally not as --output <path>. One line in a table. Edit it in. Do not just log it.
- The goal: the next agent never has to leave the loader.
2. Log the result.
echo "[$(date -u +%FT%TZ)] agent-recap: <what was missing or fixed> [FIXED|LOGGED] action_kind=<kind>" >> state/log/loader-feedback.log
FIXED= you patched this loader inline (P-fix).LOGGED= too large for inline; the PostToolUse enqueue + Stop-hook drain
will rewrite the loader from scratch on next session-end.
action_kind:shape-ok|skill-ran|loader-rewritten|pattern-elevated
Do not skip this. Every agent run must leave the system better than it found it. The loader is the setpoint; you are the sensor; the gap is the error signal; closing the gap is the correction.
api.ts- the code it can call
#!/usr/bin/env npx tsx
/**
* state/lib/agent-recap.ts -- read state/log/evals.ndjson and produce a
* ProgressList payload for the snappy-chat generative UI.
*
* Closes the catalog→cockpit drift gap surfaced by the dogfood loop: the
* intent "what did the agents do today" used to fall back to plain text
* because no producer skill emitted the right TOOL_CALL shape. This lib
* groups eval rows by skill, derives a status per skill (running if a row
* landed in the last 5 minutes, error if any score = 0, done otherwise),
* and shapes them into the ProgressList contract that
* web/src/tool-args.ts:parseProgressListArgs accepts.
*
* import { summarizeToday, summarizeRange } from "../lib/agent-recap.ts";
* const { progressList } = await summarizeToday();
*
* Pure read. Never writes. Scope-only by snappy-os convention.
*/
import { existsSync, readFileSync } from "fs";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
const HERE = dirname(fileURLToPath(import.meta.url));
const ROOT = join(HERE, "..", "..");
const EVALS_PATH = join(ROOT, "state", "log", "evals.ndjson");
const RUNNING_WINDOW_MS = 5 * 60 * 1000;
export type ProgressListStatus = "done" | "running" | "pending" | "error";
export interface ProgressListStep {
id: string;
label: string;
status: ProgressListStatus;
note?: string;
}
export interface ProgressListPayload {
steps: ProgressListStep[];
currentStepId?: string;
}
export interface SummarizeRangeOpts {
/** ISO timestamp lower bound (inclusive). Default = today 00:00 UTC. */
since?: string;
/** ISO timestamp upper bound (exclusive). Default = now. */
until?: string;
/** "now" override for tests — ms since epoch. */
nowMs?: number;
/** Override evals.ndjson path (mostly for tests). */
evalsPath?: string;
}
interface EvalRow {
ts?: string;
skill?: string;
score?: number;
primary_issue?: string | null;
}
function parseRow(line: string): EvalRow | null {
if (!line) return null;
try {
const o = JSON.parse(line);
if (!o || typeof o !== "object") return null;
return o as EvalRow;
} catch {
return null;
}
}
function startOfTodayUtcMs(nowMs: number): number {
const d = new Date(nowMs);
return Date.UTC(
d.getUTCFullYear(),
d.getUTCMonth(),
d.getUTCDate(),
0, 0, 0, 0,
);
}
function classifyStatus(args: {
hasZero: boolean;
hasRecent: boolean;
}): ProgressListStatus {
if (args.hasZero) return "error";
if (args.hasRecent) return "running";
return "done";
}
/**
* Read evals.ndjson and return a ProgressList payload covering one row per
* skill that ran in [since, until). Steps are sorted alphabetically by
* skill so the UI is stable across runs.
*/
export async function summarizeRange(
opts: SummarizeRangeOpts = {},
): Promise<{ progressList: ProgressListPayload }> {
const nowMs = opts.nowMs ?? Date.now();
const sinceMs = opts.since
? Date.parse(opts.since)
: startOfTodayUtcMs(nowMs);
const untilMs = opts.until ? Date.parse(opts.until) : nowMs;
const path = opts.evalsPath ?? EVALS_PATH;
if (!existsSync(path)) {
return { progressList: { steps: [] } };
}
const lines = readFileSync(path, "utf8").split(/\r?\n/);
type Bucket = {
runs: number;
sumScore: number;
countScored: number;
minScore: number;
maxTs: number;
};
const bySkill = new Map<string, Bucket>();
for (const line of lines) {
const row = parseRow(line);
if (!row) continue;
if (typeof row.skill !== "string" || row.skill.length === 0) continue;
if (typeof row.ts !== "string") continue;
const tsMs = Date.parse(row.ts);
if (!Number.isFinite(tsMs)) continue;
if (tsMs < sinceMs || tsMs >= untilMs) continue;
const cur = bySkill.get(row.skill) ?? {
runs: 0,
sumScore: 0,
countScored: 0,
minScore: Number.POSITIVE_INFINITY,
maxTs: 0,
};
cur.runs += 1;
if (typeof row.score === "number" && Number.isFinite(row.score)) {
cur.sumScore += row.score;
cur.countScored += 1;
if (row.score < cur.minScore) cur.minScore = row.score;
}
if (tsMs > cur.maxTs) cur.maxTs = tsMs;
bySkill.set(row.skill, cur);
}
const slugs = [...bySkill.keys()].sort((a, b) => a.localeCompare(b));
const steps: ProgressListStep[] = slugs.map((slug, i) => {
const b = bySkill.get(slug)!;
const hasRecent = nowMs - b.maxTs <= RUNNING_WINDOW_MS;
const hasZero =
b.countScored > 0 && b.minScore !== Number.POSITIVE_INFINITY && b.minScore <= 0;
const status = classifyStatus({ hasZero, hasRecent });
const meanScore =
b.countScored > 0 ? b.sumScore / b.countScored : null;
const noteParts: string[] = [`${b.runs} run${b.runs === 1 ? "" : "s"}`];
if (meanScore !== null) noteParts.push(`mean ${meanScore.toFixed(2)}`);
return {
id: `recap-${i + 1}`,
label: slug,
status,
note: noteParts.join(" · "),
};
});
// Surface "currently active" — the most recent running step, if any.
const runningStep = steps.find(s => s.status === "running");
const payload: ProgressListPayload = { steps };
if (runningStep) payload.currentStepId = runningStep.id;
return { progressList: payload };
}
/** Convenience wrapper — today (UTC) up to now. */
export async function summarizeToday(): Promise<{ progressList: ProgressListPayload }> {
return summarizeRange({});
}
/**
* Same scan as summarizeRange, but emits a PhaseDisclosure payload via the
* phase-trace builder. One disclosure row per skill; the collapsed summary
* is the run-count + mean score, the expanded body lists individual run
* timestamps + their primary issues. Used by the chat dispatcher to answer
* "what is X doing right now" / "trace the agent run" intents.
*/
export async function summarizeRangeAsPhases(
opts: SummarizeRangeOpts = {},
): Promise<{ phaseDisclosure: { phases: Array<{ label: string; summary: string; body?: string; status?: ProgressListStatus }>; title?: string } }> {
const { startTrace } = await import("./phase-trace.ts");
const nowMs = opts.nowMs ?? Date.now();
const sinceMs = opts.since
? Date.parse(opts.since)
: startOfTodayUtcMs(nowMs);
const untilMs = opts.until ? Date.parse(opts.until) : nowMs;
const path = opts.evalsPath ?? EVALS_PATH;
const trace = startTrace("Agent activity today");
if (!existsSync(path)) {
return trace.emit();
}
const lines = readFileSync(path, "utf8").split(/\r?\n/);
type Bucket = {
runs: number;
sumScore: number;
countScored: number;
minScore: number;
maxTs: number;
issues: Array<{ ts: string; score?: number; issue?: string | null }>;
};
const bySkill = new Map<string, Bucket>();
for (const line of lines) {
const row = parseRow(line);
if (!row) continue;
if (typeof row.skill !== "string" || row.skill.length === 0) continue;
if (typeof row.ts !== "string") continue;
const tsMs = Date.parse(row.ts);
if (!Number.isFinite(tsMs)) continue;
if (tsMs < sinceMs || tsMs >= untilMs) continue;
const cur = bySkill.get(row.skill) ?? {
runs: 0, sumScore: 0, countScored: 0,
minScore: Number.POSITIVE_INFINITY, maxTs: 0, issues: []
};
cur.runs += 1;
if (typeof row.score === "number" && Number.isFinite(row.score)) {
cur.sumScore += row.score;
cur.countScored += 1;
if (row.score < cur.minScore) cur.minScore = row.score;
}
if (tsMs > cur.maxTs) cur.maxTs = tsMs;
cur.issues.push({ ts: row.ts, score: row.score, issue: row.primary_issue ?? null });
bySkill.set(row.skill, cur);
}
const slugs = [...bySkill.keys()].sort((a, b) => a.localeCompare(b));
for (const slug of slugs) {
const b = bySkill.get(slug)!;
const hasRecent = nowMs - b.maxTs <= RUNNING_WINDOW_MS;
const hasZero =
b.countScored > 0 && b.minScore !== Number.POSITIVE_INFINITY && b.minScore <= 0;
const status: ProgressListStatus = classifyStatus({ hasZero, hasRecent });
const meanScore = b.countScored > 0 ? b.sumScore / b.countScored : null;
const summary = meanScore !== null
? `${b.runs} run${b.runs === 1 ? "" : "s"} · mean ${meanScore.toFixed(2)}`
: `${b.runs} run${b.runs === 1 ? "" : "s"}`;
const lastN = b.issues.slice(-5).reverse();
const bodyLines = lastN.map(r => {
const score = typeof r.score === "number" ? r.score.toFixed(2) : "—";
const issue = r.issue ? ` ${r.issue}` : "";
return `${r.ts} · score ${score}${issue}`;
});
trace.phase(slug, { summary, status }).body(bodyLines.join("\n"));
}
return trace.emit();
}
/** Convenience: today's activity as a PhaseDisclosure payload. */
export async function summarizeTodayAsPhases() {
return summarizeRangeAsPhases({});
}
// CLI smoke-test path: `npx tsx state/lib/agent-recap.ts`
const isMain = (() => {
try {
const argv1 = process.argv[1];
if (!argv1) return false;
return import.meta.url === new URL(`file://${argv1}`).href
|| import.meta.url.endsWith(argv1);
} catch { return false; }
})();
if (isMain) {
summarizeToday().then(out => {
process.stdout.write(JSON.stringify(out, null, 2) + "\n");
}).catch(err => {
console.error(`agent-recap: ${err}`);
process.exit(1);
});
}
scripts- helper scripts it can run
prose-only skill - 1 inline code block live in SKILL.md above (no state/bin/ sidecar yet).
how we check it- the checks, plus the last 6 runs
| timestamp | verb | score | primary_issue | artifact |
|---|---|---|---|---|
| 2026-04-30 04:36Z | - | 1.00 | - | - |
| 2026-04-29 04:31Z | - | 1.00 | - | - |
| 2026-04-30 04:36Z | - | 1.00 | - | - |
| 2026-04-29 04:31Z | - | 1.00 | - | - |
| 2026-04-30 04:36Z | - | 1.00 | - | - |
| 2026-04-29 04:31Z | - | 1.00 | - | - |