OR Key
drop another .md file to compare - side-by-side diff against chat-drive

chat-drive

Lets you kick off a task by simply typing what you want into chat.
personal 2 files 10 recent evals

What it does for you

Lets you kick off a task by simply typing what you want into chat.

What it produces

A recent result, so you can see the kind of work it returns.

loading…

How to get it

These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.

Work with me
For developers how this skill is built, graded, and how it runs

at a glance- the short version

actorDispatchInChatUI(text) - pushes onto the queue.
auditorThe
eval modeshape
stages1

what's inside - the parts that make up a skill 3/4 present

A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.

The skill
state/skills/chat-drive/SKILL.md present
the skill itself, in plain text
The main file. It says what the skill is and lays out the steps in plain English.
Code
state/lib/chat-drive.ts present
code the skill can run
Reusable code this skill can call when it needs to.
Scripts
state/bin/chat-drive/ not present
helper scripts
Optional. Added when a skill has a few commands to run.
Loader
state/skills/chat-drive/AGENTS.md present
what the AI loads on the fly
Loaded automatically the moment this skill is needed. Kept short on purpose.

how it runs - the shared frame every skill uses 5/5 present

Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.

makes the work The worker
present
DispatchInChatUI(text) - pushes onto the queue. the worker
Does the actual work. Whatever it produces is what gets checked next.
checks the work The reviewer
present
The the checker
A separate checker grades the work, so the part that made it can't approve its own work.
frame
learns Self-correction
present
fixes itself learns from gaps
When a run hits a gap, the skill gets edited on the spot [FIXED] or queued for a bigger rewrite [LOGGED], so it keeps getting better.
tidies up Background fixes
present
queued for rewrite runs in the background
Bigger fixes that can't be made on the spot get queued and rewritten in the background later.
remembers Run history
present
state/log/evals.ndjson shape runs
Every run is written down here, so the next time this skill is used it already knows how the last runs went.
Critical rules the things this skill must not get wrong
No must-not-break rules called out for this skill. Anything important lives in the writeup below.

what it has learned - fixes written back in over time sample

When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.

  1. Loading feedback rows…

how the work flows- who makes it, who checks it

actor DispatchInChatUI(text) - pushes onto the queue.
auditor The
1 stage
npx
npx tsx state/lib/chat-drive.ts "say hello in three words"

SKILL.md- the skill, written out in plain English

chat-drive

Push text into the snappy-chat composer from any agent, anywhere. The text flows through the same OpenUI submit path the human types into: processMessage/dispatch/chat → AG-UI stream → generative-UI cards rendered in the live React tree. Same store, same surface, same eyes.

This is the missing primitive for closed-loop snappy-chat dogfood: an agent can now type intent and watch real cards stream in, then audit by screenshot.

What it's for

  • Dogfood loops. A subagent pushes a stress-test intent, screenshots the

result, grades the rendered card. The actor (push) and the auditor (read the screenshot) are necessarily distinct - the contract holds for free.

  • Automated UX QA. Verify the welcome surface unmounts on first message,

user-pill alignment, dispatch-card variants, etc., end-to-end through the rendered DOM.

  • Recursive subagent dispatch. A long-running agent can re-enter the chat

surface mid-task by pushing a follow-up intent. The chat is the agent's outbox.

When NOT to use it

  • Anything that doesn't need the rendered UI. If you don't care about the

React tree, call the head-screen server's /dispatch/chat directly, or use the dispatch skill. Running through the chat surface adds streaming latency for no reason.

  • As a synthesis transport. The bridge is a queue, not an RPC channel -

there's no callback when streaming finishes. Use /dispatch/chat directly when you need the response programmatically.

Steps

  1. Verify the head-screen server is up. The bridge endpoints live on it.
   bash ~/projects/snappy-os/state/bin/head-screen/launch.sh   # idempotent
  1. Verify snappy-chat is running and on screen so the polled push lands somewhere.
   pgrep -af "/Applications/SnappyChat.app/Contents/MacOS/SnappyChat"

If it's not, build + install:

   cd ~/projects/snappy-chat && bash scripts/build-app.sh --install
  1. Push the intent.
   npx tsx -e "
   import { dispatchInChatUI } from './state/lib/chat-drive.ts';
   await dispatchInChatUI('what did the agents do today', { waitForFirstFrame: 12000 });
   "

The default waitForFirstFrame is 8000ms. Pass a larger value when the target backend is slow (Claude Code: 12-15s; openrouter/gemini: 6-10s).

  1. Audit by screenshot. The bridge has no completion callback - actor ≠ auditor.
   npx tsx -e "
   import { captureScreen } from './state/lib/desktop.ts';
   const path = await captureScreen('/tmp/chat-drive-verify.png');
   console.log(path);
   "

Then Read the PNG. Welcome surface unmounted + user pill on the right + assistant card streaming = bridge working.

Library API

state/lib/chat-drive.ts exports three functions. Importable from any TS agent code; also runnable as a CLI smoke.

export async function dispatchInChatUI(
  text: string,
  opts?: { waitForFirstFrame?: number }   // default 8000ms
): Promise<void>;

export async function resetChatUI(
  opts?: { waitMs?: number }               // default 1500ms
): Promise<void>;

export async function chatDriveAvailable(): Promise<boolean>;

resetChatUI is for multi-scenario dogfood loops: clears the thread and brings the welcome surface back so the next dispatchInChatUI lands in a clean state. Same FIFO as text pushes (single ordering), distinct /chat-inject-control endpoint so wire shape is unambiguous.

CLI:

npx tsx state/lib/chat-drive.ts "say hello in three words"

HEAD_SCREEN_URL env var overrides the default http://127.0.0.1:3147.

Architecture (one paragraph)

dispatchInChatUI POSTs to the head-screen server's POST /chat-inject-push endpoint, which appends the text to a 50-slot in-memory FIFO with FIFO eviction on overflow. The snappy-chat React app mounts a polling effect that hits GET /chat-inject-pop every 500ms; on hit, it locates whichever OpenUI composer is currently visible (welcome OR thread variant) and writes through React's native value setter to trigger the textarea's onChange, then clicks the submit button. From there, the real processMessage path takes over - same code as a human typing.

For QA probes and dogfood agents, include "newThread": true in the push body. The app resets to a fresh chat before dispatching that item so probe traffic does not land in Robert's active thread. Omit it only when the intent is deliberately a follow-up in the current conversation.

Eval

Actor: dispatchInChatUI(text) - pushes onto the queue. Auditor: the audit harness re-reads the lib's exported function shape (present, async, two parameters); the user-facing audit is "is the rendered card correct" via screenshot, deliberately outside the lib.

Eval kind: shape. Mechanical: import the lib, assert dispatchInChatUI and chatDriveAvailable exist as functions, type-check passes. Logged as the skill's eval row in state/log/evals.ndjson.

Pitfalls

  • The head-screen server must be alive. The bridge IS the head-screen

server. If /healthz doesn't answer, push will throw. chatDriveAvailable() is the cheap precheck.

  • No completion callback. waitForFirstFrame is the only synchronization

knob. Tune it per backend, then screenshot.

  • Restart drops queued pushes. The queue is in-memory by design - restart

= empty. If a dogfood loop relies on durability across restarts, you're using the wrong primitive.

  • The bridge is loopback only. No external network exposure. The CORS

headers are wide so file:// origins (WKWebView) work; the listener is bound to 127.0.0.1.

Files

  • state/lib/chat-drive.ts - the API (importable + CLI).
  • state/bin/head-screen/server.ts - owns the queue endpoints

(POST /chat-inject-push, POST /chat-inject-control, GET /chat-inject-pop).

  • ~/projects/snappy-chat/web/src/App.tsx - the React polling effect and

composer-injection helper that drains the queue.

AGENTS.md- what the AI loads when this skill comes up

chat-drive - loader

Per-turn rules for chat-drive. Push text via queue to the live React chat. Full reference: state/skills/chat-drive/SKILL.md. Lib: state/lib/chat-drive.ts. Server: state/bin/head-screen/server.ts. Consumer: ~/projects/snappy-chat/web/src/App.tsx (poll 1000ms).

Critical Rules

  1. Actor (push) ≠ auditor (screenshot). dispatchInChatUI() queues text. Lib never reports "did card render" - audit via captureScreen + Read. No callback when streaming finishes.
  2. waitForFirstFrame is ONLY sync knob. Default 8000ms. Tune per backend: Claude Code 12-15s, openrouter/gemini 6-10s, plus 1000ms React poll lag. After push, screenshot to audit.
  3. Head-screen server MUST be alive. Pre-flight: chatDriveAvailable() or bash state/bin/head-screen/launch.sh (idempotent). Verify: /healthz answers.
  4. snappy-chat must be running and visible. Push → in-memory 30-slot FIFO. No React polling = silent eviction. Confirm: pgrep -af "/Applications/SnappyChat.app".
  5. Queue is in-memory only, 30s TTL per item, server-restart-safe = wiped. Never durability-dependent. Drain before testing: curl -XPOST /chat-inject-flush returns flushed count.
  6. React polls every 1000ms (App.tsx:337). Each push incurs up to 1000ms before composer sees it. Factor into waitForFirstFrame budget.
  7. tsx never hot-reloads server.ts. After any server.ts edit, restart the server process. Verify running code: pgrep -af server.ts + git log match.
  8. Don't push faster than dispatcher streams. Wait for RUN_FINISHED before next text push. Control actions (open-cowork, stop, theme:*, view-*) must stay live while a stream is running; the React poller calls /chat-inject-pop?busy=1 during active streams so controls pop but text stays queued server-side until RUN_FINISHED. Use resetChatUI() (NOT peekaboo) between scenarios.
  9. resetChatUI() does NOT flush. It sends control items for nav reset (welcome). Pre-flush with /chat-inject-flush if stale pushes queued.
  10. Activate app before screenshotting. SnappyChat on secondary Space = wallpaper screenshot. Run osascript -e 'tell application "Snappy Chat" to activate' first (1-2s wait).
  11. React pre-fetches queue items. App polls /chat-inject-pop before dispatch. Use endpoint /chat-inject-flush, never manual curl drain (competes with React).
  12. Queue inspection is non-consuming. GET /chat-inject?agentId=ui returns {items, depth, controlDepth, textDepth} without shifting the queue. Never use GET /chat-inject-pop for inspection; pop consumes and can steal React's next item.
  13. Concurrent claude -p / claude --continue compete for queue. Symptom: your pushes hit queued:N, then pop returns empty, no matching intent_chars in dispatch-chat.ndjson. Mitigation: serialize QA or use direct /dispatch/chat POST (isolated SSE stream, independent thread).
  14. Server crashes wipe in-memory queue. Failure mode (2026-04-30+): FATAL evalLeaderboardRegex undefined from server.ts. Verify uptime: ps -p $(pgrep -f server.ts | head -1) -o etime. If elapsed time reset between push and audit, queue is gone. Check log: tail state/log/head-screen.log | grep FATAL.
  15. Force fresh crypto.randomUUID() for each request messageId. Reusing any id (including OpenUI's optimistic user-message id) causes the assistant message to collide in the store reducer and produces a duplicate-render bug. This is in App.tsx:processMessage - do not remove the crypto.randomUUID() call.
  16. Audit with window-targeted screenshot. After dispatch, capture: peekaboo screenshot --app "Snappy Chat" --window-index 1 --output /tmp/after-dispatch.png. App name has a space. Window index 1 is the real chat window (index 0 is the helper).

Commands

| ui dashboard | state/skills/chat-drive/resources/ui.openui |

operationcommand
push (TS)import { dispatchInChatUI, resetChatUI, chatDriveAvailable } from "./state/lib/chat-drive.ts"
push (CLI)npx tsx state/lib/chat-drive.ts "<intent>" (all args = text, no control)
preflightchatDriveAvailable() (pings /healthz, returns bool)
serverbash state/bin/head-screen/launch.sh (idempotent)
verify chat runningpgrep -af "/Applications/SnappyChat.app"
verify server PIDpgrep -af server.ts (match git log for running code)
activate apposascript -e 'tell application "Snappy Chat" to activate' (1-2s before screenshot)
drain queuecurl -XPOST 127.0.0.1:3147/chat-inject-flush (returns {flushed:N})
reset to welcomecurl -XPOST /chat-inject-control -d '{"action":"reset"}' (does NOT flush)
navigate sidebar`curl -XPOST /chat-inject-control -d '{"action":"view-artifactsview-filesview-chatview-scheduledview-customizeview-projects"}'` - Skills tab = view-files (NOT view-skills)
switch to threadcurl -XPOST /chat-inject-control -H "Content-Type: application/json" -d '{"action":"select-thread","threadId":"<uuid>"}' - full message history replays immediately. Use curl -s http://127.0.0.1:3147/threads to list thread IDs + titles.
direct dispatchcurl -XPOST /dispatch/chat -d '{"intent":"<text>","threadId":"<id>"}' (bypass queue)
check competing consumers`pgrep -af "claude.*continue\claude.*-p" \wc -l` (>1 = contention)
verify push dispatched`tail state/log/dispatch-chat.ndjson \grep intent_chars:<N>` (match intent length)
screenshotnpx tsx state/lib/desktop.ts capture-screen /tmp/path.png
server uptime`ps -p $(pgrep -f server.ts \head -1) -o etime` (reset = queue wiped)
server log`tail state/log/head-screen.log \grep FATAL` (spot crash)
env overrideHEAD_SCREEN_URL (default http://127.0.0.1:3147)
referencestate/skills/chat-drive/SKILL.md
eval logstate/log/evals.ndjson (skill: chat-drive)

Self-Test

An agent reading this should correctly:

  1. [ ] Pre-flight chatDriveAvailable() before push?
  2. [ ] Tune waitForFirstFrame for backend + 1000ms React poll lag?
  3. [ ] Audit by screenshot Read, NOT lib return values?
  4. [ ] Keep both snappy-chat and head-screen alive?
  5. [ ] Use resetChatUI() (not peekaboo) between scenarios?
  6. [ ] Restart server after any server.ts edit (tsx no hot-reload)?
  7. [ ] Serialize pushes; wait for RUN_FINISHED before next?
  8. [ ] Drain with /chat-inject-flush before QA tests?
  9. [ ] Know React pre-fetches (no callback on dispatch complete)?
  10. [ ] Activate app via osascript before screenshot on secondary Space?
  11. [ ] Know resetChatUI() does NOT flush (pre-flush if stale items queued)?
  12. [ ] Check pgrep claude.*-p for competing queue consumers?

<!-- kernel-ok: historical attribution for the PID-loader pattern. -->

Self-correcting loader (PID feedback)

This is the rule that makes loaders learn. Ported from snappy-kernel, where it demonstrably worked: the snappy-dom-cartographer skill mapped Skool admin UI once, every later ship-agent inherited the full map for free.

Before you finish, do two things:

1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case, you MUST attempt an Edit to this AGENTS.md before you log.

  • Default action: edit this file. Surgical. Add the missing line, table row, or code example.
  • Only LOGGED is allowed when: the fix needs >10 lines or spans multiple files.
  • The goal: the next agent never has to leave the loader.

2. Log the result.

echo "[$(date -u +%FT%TZ)] chat-drive: <what was missing or fixed> [FIXED|LOGGED] action_kind=<kind>" >> state/log/loader-feedback.log

Do not skip this. Every agent run must leave the system better than it found it.

OpenUI Resource

  • Skill-owned OpenUI Lang resource: state/skills/chat-drive/resources/ui.openui. Read it before rendering or editing this skill's generated component surface.
  • Treat this resource as a first-class artifact of the skill, not a generic chat response. Improve it when the skill's user-facing output needs to become richer.
  • System resources compose OpenUI primitives and inherit SnappyChat tokens. Use ui_contract: branded in SKILL.md only for deliberate platform or client visuals.

api.ts- the code it can call

// snappy-chat-drive/api.ts
//
// Push text into the snappy-chat composer programmatically. The bridge is
// the head-screen server's chat-inject FIFO: this lib POSTs to /chat-inject-push,
// the snappy-chat WKWebView polls /chat-inject-pop on a 500ms interval and
// runs the text through the real OpenUI submit path (processMessage →
// /dispatch/chat). The result: dogfood loops, automated UX QA, and recursive
// subagent dispatch all flow through the actual chat surface — same React
// store, same generative-UI cards — instead of trying to drive WKWebView
// with synthetic clicks (peekaboo's clickAt does not fire React onClick on
// WKWebView).
//
// Sync contract: there is NO callback when the chat finishes streaming. The
// caller is the actor (push); the auditor is whatever reads a screenshot
// afterward. `waitForFirstFrame` is a coarse sleep so the dispatcher has
// time to start streaming before the auditor captures.

const HEAD_SCREEN_BASE = process.env.HEAD_SCREEN_URL ?? "http://127.0.0.1:3147";
const DEFAULT_FIRST_FRAME_MS = 8_000;

export interface DispatchInChatUIOpts {
  /**
   * Sleep duration after the push so the dispatcher has time to start
   * streaming. Default 8000ms. Pass 0 to return immediately.
   */
  waitForFirstFrame?: number;
  /**
   * Per-agent queue isolation key. The server keeps a Map<agentId, queue>
   * so parallel QA subagents don't share a single FIFO. Default "ui" matches
   * the snappy-chat React poll loop — so omitting this routes pushes to the
   * actual cockpit. Pass a stable identifier (e.g. "qa-broad-smoke",
   * "dogfood-loop2") to isolate from the cockpit and from each other.
   */
  agentId?: string;
}

/**
 * Push `text` onto the snappy-chat input bridge. Resolves once the queue
 * has accepted the push and (optionally) `waitForFirstFrame` ms have passed.
 *
 * Throws if the head-screen server is unreachable or the push is rejected.
 */
export async function dispatchInChatUI(
  text: string,
  opts: DispatchInChatUIOpts = {},
): Promise<void> {
  if (typeof text !== "string" || text.length === 0) {
    throw new Error("dispatchInChatUI: text (non-empty string) required");
  }
  const wait = opts.waitForFirstFrame ?? DEFAULT_FIRST_FRAME_MS;
  const body: { text: string; agentId?: string } = { text };
  if (typeof opts.agentId === "string" && opts.agentId.length > 0) {
    body.agentId = opts.agentId;
  }

  const res = await fetch(`${HEAD_SCREEN_BASE}/chat-inject-push`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    let detail = "";
    try { detail = await res.text(); } catch {}
    throw new Error(
      `chat-inject-push ${res.status}: ${detail.slice(0, 240) || res.statusText}`,
    );
  }

  if (wait > 0) {
    await new Promise(r => setTimeout(r, wait));
  }
}

export interface ResetChatUIOpts {
  /**
   * Sleep duration after the control push so the React app has time to
   * pop the control message, unmount FullScreen, and remount the welcome
   * surface. Default 1500ms — enough for the 500ms poll cadence + a remount.
   */
  waitMs?: number;
  /**
   * Per-agent queue isolation key. See `DispatchInChatUIOpts.agentId`.
   * Default "ui". Parallel QA agents pass their own ID so a reset on one
   * thread doesn't drop the queue another agent is filling.
   */
  agentId?: string;
}

/**
 * Push a control message that resets the snappy-chat UI to the welcome
 * surface. Equivalent to the user clicking "+ New chat" in the sidebar.
 * Use between dogfood scenarios so a single subagent can run multiple
 * intents end-to-end without thread state bleeding between them.
 *
 * Throws if the head-screen server is unreachable or the push is rejected.
 */
export async function resetChatUI(opts: ResetChatUIOpts = {}): Promise<void> {
  const wait = opts.waitMs ?? 1500;
  const body: { action: string; agentId?: string } = { action: "reset" };
  if (typeof opts.agentId === "string" && opts.agentId.length > 0) {
    body.agentId = opts.agentId;
  }
  const res = await fetch(`${HEAD_SCREEN_BASE}/chat-inject-control`, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    let detail = "";
    try { detail = await res.text(); } catch {}
    throw new Error(
      `chat-inject-control ${res.status}: ${detail.slice(0, 240) || res.statusText}`,
    );
  }
  if (wait > 0) {
    await new Promise(r => setTimeout(r, wait));
  }
}

/**
 * Cheap reachability check for the head-screen server. Returns true iff the
 * server answers any 2xx-ish response on `/healthz`. Use to gate dogfood
 * loops so they fail fast when the bridge is down rather than timing out
 * mid-push.
 */
export async function chatDriveAvailable(): Promise<boolean> {
  try {
    const res = await fetch(`${HEAD_SCREEN_BASE}/healthz`, { method: "GET" });
    return res.ok;
  } catch {
    return false;
  }
}

// CLI smoke: `npx tsx state/lib/chat-drive.ts "say hello in three words"`
// Set CHAT_INJECT_AGENT_ID=<id> to isolate from the cockpit's "ui" queue
// (e.g. parallel QA subagents).
if (import.meta.url === `file://${process.argv[1]}`) {
  const text = process.argv.slice(2).join(" ").trim();
  if (!text) {
    console.error('usage: tsx state/lib/chat-drive.ts "<intent>"');
    process.exit(2);
  }
  const agentId = process.env.CHAT_INJECT_AGENT_ID;
  (async () => {
    const up = await chatDriveAvailable();
    if (!up) {
      console.error("head-screen server unreachable at", HEAD_SCREEN_BASE);
      process.exit(1);
    }
    await dispatchInChatUI(text, { waitForFirstFrame: 0, agentId });
    console.log("OK pushed:", text, agentId ? `(agentId=${agentId})` : "");
  })().catch(e => { console.error("FAIL:", e?.message ?? e); process.exit(1); });
}

scripts- helper scripts it can run

prose-only skill - 2 inline code blocks live in SKILL.md above (no state/bin/ sidecar yet).

how we check it- the checks, plus the last 10 runs

rubric shape schema-shape check (no inline rubric)
recent mean 0.83 · 10 runs actor/auditor: unverifiable
deps none declared
timestamp verb score primary_issue artifact
2026-04-30 07:51Z - 0.80 - -
2026-04-30 07:39Z - 0.70 - -
2026-04-30 07:38Z - 0.72 - -
2026-04-30 07:33Z - 0.70 - -
2026-04-30 07:15Z - 0.85 - -
2026-04-30 03:10Z - 0.85 - -
2026-04-30 06:55Z - 0.67 - -
2026-04-29 04:43Z - 1.00 - -
2026-04-29 04:14Z - 1.00 - -
2026-04-29 04:01Z - 1.00 - -