drop another .md file to compare - side-by-side diff against chat-drive

chat-drive

Lets you kick off a task by simply typing what you want into chat.

personal 2 files 10 recent evals

Export

What it does for you

Lets you kick off a task by simply typing what you want into chat.

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me

For developers how this skill is built, graded, and how it runs

at a glance- the short version

actorDispatchInChatUI(text) - pushes onto the queue.

auditorThe

eval modeshape

stages1

what's inside - the parts that make up a skill 3/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill

state/skills/chat-drive/SKILL.md present

the skill itself, in plain text

The main file. It says what the skill is and lays out the steps in plain English.

Code

state/lib/chat-drive.ts present

code the skill can run

Reusable code this skill can call when it needs to.

Scripts

state/bin/chat-drive/ not present

helper scripts

Optional. Added when a skill has a few commands to run.

Loader

state/skills/chat-drive/AGENTS.md present

what the AI loads on the fly

Loaded automatically the moment this skill is needed. Kept short on purpose.

how it runs - the shared frame every skill uses 5/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker

present

DispatchInChatUI(text) - pushes onto the queue. the worker

Does the actual work. Whatever it produces is what gets checked next.

checks the work The reviewer

present

The the checker

A separate checker grades the work, so the part that made it can't approve its own work.

frame

learns Self-correction

present

fixes itself learns from gaps

When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.

tidies up Background fixes

present

queued for rewrite runs in the background

Bigger fixes that can't be made on the spot get queued and rewritten in the background later.

remembers Run history

present

state/log/evals.ndjson shape runs

Every run is written down here, so the next time this skill is used it already knows how the last runs went.

Critical rules the things this skill must not get wrong

No must-not-break rules called out for this skill. Anything important lives in the writeup below.

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

Loading feedback rows…

how the work flows- who makes it, who checks it

actor DispatchInChatUI(text) - pushes onto the queue.

auditor The

1 stage

npx

npx tsx state/lib/chat-drive.ts "say hello in three words"

+ eval for this step

SKILL.md- the skill, written out in plain English

chat-drive

Push text into the snappy-chat composer from any agent, anywhere. The text flows through the same OpenUI submit path the human types into: processMessage → /dispatch/chat → AG-UI stream → generative-UI cards rendered in the live React tree. Same store, same surface, same eyes.

This is the missing primitive for closed-loop snappy-chat dogfood: an agent can now type intent and watch real cards stream in, then audit by screenshot.

What it's for

Dogfood loops. A subagent pushes a stress-test intent, screenshots the

result, grades the rendered card. The actor (push) and the auditor (read the screenshot) are necessarily distinct - the contract holds for free.

Automated UX QA. Verify the welcome surface unmounts on first message,

user-pill alignment, dispatch-card variants, etc., end-to-end through the rendered DOM.

Recursive subagent dispatch. A long-running agent can re-enter the chat

surface mid-task by pushing a follow-up intent. The chat is the agent's outbox.

When NOT to use it

Anything that doesn't need the rendered UI. If you don't care about the

React tree, call the head-screen server's /dispatch/chat directly, or use the dispatch skill. Running through the chat surface adds streaming latency for no reason.

As a synthesis transport. The bridge is a queue, not an RPC channel -

there's no callback when streaming finishes. Use /dispatch/chat directly when you need the response programmatically.

Steps

Verify the head-screen server is up. The bridge endpoints live on it.

   bash ~/projects/snappy-os/state/bin/head-screen/launch.sh   # idempotent

Verify snappy-chat is running and on screen so the polled push lands somewhere.

   pgrep -af "/Applications/SnappyChat.app/Contents/MacOS/SnappyChat"

If it's not, build + install:

   cd ~/projects/snappy-chat && bash scripts/build-app.sh --install

Push the intent.

   npx tsx -e "
   import { dispatchInChatUI } from './state/lib/chat-drive.ts';
   await dispatchInChatUI('what did the agents do today', { waitForFirstFrame: 12000 });
   "

The default waitForFirstFrame is 8000ms. Pass a larger value when the target backend is slow (Claude Code: 12-15s; openrouter/gemini: 6-10s).

Audit by screenshot. The bridge has no completion callback - actor ≠ auditor.

   npx tsx -e "
   import { captureScreen } from './state/lib/desktop.ts';
   const path = await captureScreen('/tmp/chat-drive-verify.png');
   console.log(path);
   "

Then Read the PNG. Welcome surface unmounted + user pill on the right + assistant card streaming = bridge working.

Library API

state/lib/chat-drive.ts exports three functions. Importable from any TS agent code; also runnable as a CLI smoke.

export async function dispatchInChatUI(
  text: string,
  opts?: { waitForFirstFrame?: number }   // default 8000ms
): Promise<void>;

export async function resetChatUI(
  opts?: { waitMs?: number }               // default 1500ms
): Promise<void>;

export async function chatDriveAvailable(): Promise<boolean>;

resetChatUI is for multi-scenario dogfood loops: clears the thread and brings the welcome surface back so the next dispatchInChatUI lands in a clean state. Same FIFO as text pushes (single ordering), distinct /chat-inject-control endpoint so wire shape is unambiguous.

CLI:

npx tsx state/lib/chat-drive.ts "say hello in three words"

HEAD_SCREEN_URL env var overrides the default http://127.0.0.1:3147.

Architecture (one paragraph)

dispatchInChatUI POSTs to the head-screen server's POST /chat-inject-push endpoint, which appends the text to a 50-slot in-memory FIFO with FIFO eviction on overflow. The snappy-chat React app mounts a polling effect that hits GET /chat-inject-pop every 500ms; on hit, it locates whichever OpenUI composer is currently visible (welcome OR thread variant) and writes through React's native value setter to trigger the textarea's onChange, then clicks the submit button. From there, the real processMessage path takes over - same code as a human typing.

For QA probes and dogfood agents, include "newThread": true in the push body. The app resets to a fresh chat before dispatching that item so probe traffic does not land in Robert's active thread. Omit it only when the intent is deliberately a follow-up in the current conversation.

Eval

Actor: dispatchInChatUI(text) - pushes onto the queue. Auditor: the audit harness re-reads the lib's exported function shape (present, async, two parameters); the user-facing audit is "is the rendered card correct" via screenshot, deliberately outside the lib.

Eval kind: shape. Mechanical: import the lib, assert dispatchInChatUI and chatDriveAvailable exist as functions, type-check passes. Logged as the skill's eval row in state/log/evals.ndjson.

Pitfalls

The head-screen server must be alive. The bridge IS the head-screen

server. If /healthz doesn't answer, push will throw. chatDriveAvailable() is the cheap precheck.

No completion callback. waitForFirstFrame is the only synchronization

knob. Tune it per backend, then screenshot.

Restart drops queued pushes. The queue is in-memory by design - restart

= empty. If a dogfood loop relies on durability across restarts, you're using the wrong primitive.

The bridge is loopback only. No external network exposure. The CORS

headers are wide so file:// origins (WKWebView) work; the listener is bound to 127.0.0.1.

Files

state/lib/chat-drive.ts - the API (importable + CLI).
state/bin/head-screen/server.ts - owns the queue endpoints

(POST /chat-inject-push, POST /chat-inject-control, GET /chat-inject-pop).

~/projects/snappy-chat/web/src/App.tsx - the React polling effect and

composer-injection helper that drains the queue.

AGENTS.md- what the AI loads when this skill comes up

chat-drive - loader

Per-turn rules for chat-drive. Push text via queue to the live React chat. Full reference: state/skills/chat-drive/SKILL.md. Lib: state/lib/chat-drive.ts. Server: state/bin/head-screen/server.ts. Consumer: ~/projects/snappy-chat/web/src/App.tsx (poll 1000ms).

Critical Rules

Actor (push) ≠ auditor (screenshot). dispatchInChatUI() queues text. Lib never reports "did card render" - audit via captureScreen + Read. No callback when streaming finishes.
waitForFirstFrame is ONLY sync knob. Default 8000ms. Tune per backend: Claude Code 12-15s, openrouter/gemini 6-10s, plus 1000ms React poll lag. After push, screenshot to audit.
Head-screen server MUST be alive. Pre-flight: chatDriveAvailable() or bash state/bin/head-screen/launch.sh (idempotent). Verify: /healthz answers.
snappy-chat must be running and visible. Push → in-memory 30-slot FIFO. No React polling = silent eviction. Confirm: pgrep -af "/Applications/SnappyChat.app".
Queue is in-memory only, 30s TTL per item, server-restart-safe = wiped. Never durability-dependent. Drain before testing: curl -XPOST /chat-inject-flush returns flushed count.
React polls every 1000ms (App.tsx:337). Each push incurs up to 1000ms before composer sees it. Factor into waitForFirstFrame budget.
tsx never hot-reloads server.ts. After any server.ts edit, restart the server process. Verify running code: pgrep -af server.ts + git log match.
Don't push faster than dispatcher streams. Wait for RUN_FINISHED before next text push. Control actions (open-cowork, stop, theme:*, view-*) must stay live while a stream is running; the React poller calls /chat-inject-pop?busy=1 during active streams so controls pop but text stays queued server-side until RUN_FINISHED. Use resetChatUI() (NOT peekaboo) between scenarios.
resetChatUI() does NOT flush. It sends control items for nav reset (welcome). Pre-flush with /chat-inject-flush if stale pushes queued.
Activate app before screenshotting. SnappyChat on secondary Space = wallpaper screenshot. Run osascript -e 'tell application "Snappy Chat" to activate' first (1-2s wait).
React pre-fetches queue items. App polls /chat-inject-pop before dispatch. Use endpoint /chat-inject-flush, never manual curl drain (competes with React).
Queue inspection is non-consuming. GET /chat-inject?agentId=ui returns {items, depth, controlDepth, textDepth} without shifting the queue. Never use GET /chat-inject-pop for inspection; pop consumes and can steal React's next item.
Concurrent claude -p / claude --continue compete for queue. Symptom: your pushes hit queued:N, then pop returns empty, no matching intent_chars in dispatch-chat.ndjson. Mitigation: serialize QA or use direct /dispatch/chat POST (isolated SSE stream, independent thread).
Server crashes wipe in-memory queue. Failure mode (2026-04-30+): FATAL evalLeaderboardRegex undefined from server.ts. Verify uptime: ps -p $(pgrep -f server.ts | head -1) -o etime. If elapsed time reset between push and audit, queue is gone. Check log: tail state/log/head-screen.log | grep FATAL.
Force fresh crypto.randomUUID() for each request messageId. Reusing any id (including OpenUI's optimistic user-message id) causes the assistant message to collide in the store reducer and produces a duplicate-render bug. This is in App.tsx:processMessage - do not remove the crypto.randomUUID() call.
Audit with window-targeted screenshot. After dispatch, capture: peekaboo screenshot --app "Snappy Chat" --window-index 1 --output /tmp/after-dispatch.png. App name has a space. Window index 1 is the real chat window (index 0 is the helper).

Commands

| ui dashboard | state/skills/chat-drive/resources/ui.openui |

operation	command
push (TS)	`import { dispatchInChatUI, resetChatUI, chatDriveAvailable } from "./state/lib/chat-drive.ts"`
push (CLI)	`npx tsx state/lib/chat-drive.ts "<intent>"` (all args = text, no control)
preflight	`chatDriveAvailable()` (pings `/healthz`, returns bool)
server	`bash state/bin/head-screen/launch.sh` (idempotent)
verify chat running	`pgrep -af "/Applications/SnappyChat.app"`
verify server PID	`pgrep -af server.ts` (match git log for running code)
activate app	`osascript -e 'tell application "Snappy Chat" to activate'` (1-2s before screenshot)
drain queue	`curl -XPOST 127.0.0.1:3147/chat-inject-flush` (returns `{flushed:N}`)
reset to welcome	`curl -XPOST /chat-inject-control -d '{"action":"reset"}'` (does NOT flush)
navigate sidebar	`curl -XPOST /chat-inject-control -d '{"action":"view-artifacts	view-files	view-chat	view-scheduled	view-customize	view-projects"}'` - Skills tab = view-files (NOT view-skills)
switch to thread	`curl -XPOST /chat-inject-control -H "Content-Type: application/json" -d '{"action":"select-thread","threadId":"<uuid>"}'` - full message history replays immediately. Use `curl -s http://127.0.0.1:3147/threads` to list thread IDs + titles.
direct dispatch	`curl -XPOST /dispatch/chat -d '{"intent":"<text>","threadId":"<id>"}'` (bypass queue)
check competing consumers	`pgrep -af "claude.*continue\	claude.*-p" \	wc -l` (>1 = contention)
verify push dispatched	`tail state/log/dispatch-chat.ndjson \	grep intent_chars:<N>` (match intent length)
screenshot	`npx tsx state/lib/desktop.ts capture-screen /tmp/path.png`
server uptime	`ps -p $(pgrep -f server.ts \	head -1) -o etime` (reset = queue wiped)
server log	`tail state/log/head-screen.log \	grep FATAL` (spot crash)
env override	`HEAD_SCREEN_URL` (default `http://127.0.0.1:3147`)
reference	`state/skills/chat-drive/SKILL.md`
eval log	`state/log/evals.ndjson` (skill: chat-drive)

Self-Test

An agent reading this should correctly:

[ ] Pre-flight chatDriveAvailable() before push?
[ ] Tune waitForFirstFrame for backend + 1000ms React poll lag?
[ ] Audit by screenshot Read, NOT lib return values?
[ ] Keep both snappy-chat and head-screen alive?
[ ] Use resetChatUI() (not peekaboo) between scenarios?
[ ] Restart server after any server.ts edit (tsx no hot-reload)?
[ ] Serialize pushes; wait for RUN_FINISHED before next?
[ ] Drain with /chat-inject-flush before QA tests?
[ ] Know React pre-fetches (no callback on dispatch complete)?
[ ] Activate app via osascript before screenshot on secondary Space?
[ ] Know resetChatUI() does NOT flush (pre-flush if stale items queued)?
[ ] Check pgrep claude.*-p for competing queue consumers?

Self-correcting loader (PID feedback)

This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.

Before you finish, do two things:

1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case, you MUST attempt an Edit to this AGENTS.md before you log.

Default action: edit this file. Surgical. Add the missing line, table row, or code example.
Only LOGGED is allowed when: the fix needs >10 lines or spans multiple files.
The goal: the next agent never has to leave the loader.

2. Log the result.

echo "[$(date -u +%FT%TZ)] chat-drive: <what was missing or fixed> [FIXED|LOGGED] action_kind=<kind>" >> state/log/loader-feedback.log

Do not skip this. Every agent run must leave the system better than it found it.

OpenUI Resource

Skill-owned OpenUI Lang resource: state/skills/chat-drive/resources/ui.openui. Read it before rendering or editing this skill's generated component surface.
Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

api.ts- the code it can call

// snappy-chat-drive/api.ts
//
// Push text into the snappy-chat composer programmatically. The bridge is
// the head-screen server's chat-inject FIFO: this lib POSTs to /chat-inject-push,
// the snappy-chat WKWebView polls /chat-inject-pop on a 500ms interval and
// runs the text through the real OpenUI submit path (processMessage →
// /dispatch/chat). The result: dogfood loops, automated UX QA, and recursive
// subagent dispatch all flow through the actual chat surface — same React
// store, same generative-UI cards — instead of trying to drive WKWebView
// with synthetic clicks (peekaboo's clickAt does not fire React onClick on
// WKWebView).
//
// Sync contract: there is NO callback when the chat finishes streaming. The
// caller is the actor (push); the auditor is whatever reads a screenshot
// afterward. `waitForFirstFrame` is a coarse sleep so the dispatcher has
// time to start streaming before the auditor captures.

const HEAD_SCREEN_BASE = process.env.HEAD_SCREEN_URL ?? "http://127.0.0.1:3147";
const DEFAULT_FIRST_FRAME_MS = 8_000;

export interface DispatchInChatUIOpts {
  /**
   * Sleep duration after the push so the dispatcher has time to start
   * streaming. Default 8000ms. Pass 0 to return immediately.
   */
  waitForFirstFrame?: number;
  /**
   * Per-agent queue isolation key. The server keeps a Map<agentId, queue>
   * so parallel QA subagents don't share a single FIFO. Default "ui" matches
   * the snappy-chat React poll loop — so omitting this routes pushes to the
   * actual cockpit. Pass a stable identifier (e.g. "qa-broad-smoke",
   * "dogfood-loop2") to isolate from the cockpit and from each other.
   */
  agentId?: string;
}

/**
 * Push `text` onto the snappy-chat input bridge. Resolves once the queue
 * has accepted the push and (optionally) `waitForFirstFrame` ms have passed.
 *
 * Throws if the head-screen server is unreachable or the push is rejected.
 */
export async function dispatchInChatUI(
  text: string,
  opts: DispatchInChatUIOpts = {},
): Promise<void> {
  if (typeof text !== "string" || text.length === 0) {
    throw new Error("dispatchInChatUI: text (non-empty string) required");
  }
  const wait = opts.waitForFirstFrame ?? DEFAULT_FIRST_FRAME_MS;
  const body: { text: string; agentId?: string } = { text };
  if (typeof opts.agentId === "string" && opts.agentId.length > 0) {
    body.agentId = opts.agentId;
  }

  const res = await fetch(`${HEAD_SCREEN_BASE}/chat-inject-push`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    let detail = "";
    try { detail = await res.text(); } catch {}
    throw new Error(
      `chat-inject-push ${res.status}: ${detail.slice(0, 240) || res.statusText}`,
    );
  }

  if (wait > 0) {
    await new Promise(r => setTimeout(r, wait));
  }
}

export interface ResetChatUIOpts {
  /**
   * Sleep duration after the control push so the React app has time to
   * pop the control message, unmount FullScreen, and remount the welcome
   * surface. Default 1500ms — enough for the 500ms poll cadence + a remount.
   */
  waitMs?: number;
  /**
   * Per-agent queue isolation key. See `DispatchInChatUIOpts.agentId`.
   * Default "ui". Parallel QA agents pass their own ID so a reset on one
   * thread doesn't drop the queue another agent is filling.
   */
  agentId?: string;
}

/**
 * Push a control message that resets the snappy-chat UI to the welcome
 * surface. Equivalent to the user clicking "+ New chat" in the sidebar.
 * Use between dogfood scenarios so a single subagent can run multiple
 * intents end-to-end without thread state bleeding between them.
 *
 * Throws if the head-screen server is unreachable or the push is rejected.
 */
export async function resetChatUI(opts: ResetChatUIOpts = {}): Promise<void> {
  const wait = opts.waitMs ?? 1500;
  const body: { action: string; agentId?: string } = { action: "reset" };
  if (typeof opts.agentId === "string" && opts.agentId.length > 0) {
    body.agentId = opts.agentId;
  }
  const res = await fetch(`${HEAD_SCREEN_BASE}/chat-inject-control`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    let detail = "";
    try { detail = await res.text(); } catch {}
    throw new Error(
      `chat-inject-control ${res.status}: ${detail.slice(0, 240) || res.statusText}`,
    );
  }
  if (wait > 0) {
    await new Promise(r => setTimeout(r, wait));
  }
}

/**
 * Cheap reachability check for the head-screen server. Returns true iff the
 * server answers any 2xx-ish response on `/healthz`. Use to gate dogfood
 * loops so they fail fast when the bridge is down rather than timing out
 * mid-push.
 */
export async function chatDriveAvailable(): Promise<boolean> {
  try {
    const res = await fetch(`${HEAD_SCREEN_BASE}/healthz`, { method: "GET" });
    return res.ok;
  } catch {
    return false;
  }
}

// CLI smoke: `npx tsx state/lib/chat-drive.ts "say hello in three words"`
// Set CHAT_INJECT_AGENT_ID=<id> to isolate from the cockpit's "ui" queue
// (e.g. parallel QA subagents).
if (import.meta.url === `file://${process.argv[1]}`) {
  const text = process.argv.slice(2).join(" ").trim();
  if (!text) {
    console.error('usage: tsx state/lib/chat-drive.ts "<intent>"');
    process.exit(2);
  }
  const agentId = process.env.CHAT_INJECT_AGENT_ID;
  (async () => {
    const up = await chatDriveAvailable();
    if (!up) {
      console.error("head-screen server unreachable at", HEAD_SCREEN_BASE);
      process.exit(1);
    }
    await dispatchInChatUI(text, { waitForFirstFrame: 0, agentId });
    console.log("OK pushed:", text, agentId ? `(agentId=${agentId})` : "");
  })().catch(e => { console.error("FAIL:", e?.message ?? e); process.exit(1); });
}

scripts- helper scripts it can run

prose-only skill - 2 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 10 runs

rubric shape schema-shape check (no inline rubric)

recent mean 0.83 · 10 runs actor/auditor: unverifiable

deps none declared

timestamp	verb	score	primary_issue	artifact
2026-04-30 07:51Z	-	0.80	-	-
2026-04-30 07:39Z	-	0.70	-	-
2026-04-30 07:38Z	-	0.72	-	-
2026-04-30 07:33Z	-	0.70	-	-
2026-04-30 07:15Z	-	0.85	-	-
2026-04-30 03:10Z	-	0.85	-	-
2026-04-30 06:55Z	-	0.67	-	-
2026-04-29 04:43Z	-	1.00	-	-
2026-04-29 04:14Z	-	1.00	-	-
2026-04-29 04:01Z	-	1.00	-	-