No work step here. This is probably a skill that reads or coordinates, not one that produces something.
.md file to compare - side-by-side diff against prompt-improver
prompt-improver
What it does for you
Improves how a skill works based on where it's been falling short.
What it produces
A recent result, so you can see the kind of work it returns.
loading…
How to get it
These run inside the Snappy workspace. Want this working in your business? I set skills like this up with you, in one focused week.
For developers how this skill is built, graded, and how it runs
at a glance- the short version
what's inside - the parts that make up a skill 3/4 present
A skill is just a few plain-text files. Only the main one is required. The rest are optional, added as the work needs them. This is what the skill is made of; how it runs is just below.
state/skills/prompt-improver/SKILL.md
present
state/lib/prompt-improver.ts
present
state/bin/prompt-improver/
not present
state/skills/prompt-improver/AGENTS.md
present
how it runs - the shared frame every skill uses 3/5 present
Every skill runs the same way. One part does the work, a separate part checks it, and a short loader hands the AI exactly what it needs for the job. Anything this skill doesn't use shows a one-line note saying why, on purpose, not by accident.
No separate check found. Without one, the part that makes the work could end up approving its own work, worth a closer look.
state/log/evals.ndjson what it has learned - fixes written back in over time sample
When a run hits something this skill didn't handle, the fix gets written back into the skill so it doesn't happen again. FIXED means it was corrected on the spot. LOGGED means it's queued for a bigger rewrite. Either way, the skill gets a little better and never makes the same mistake twice.
- Loading feedback rows…
SKILL.md- the skill, written out in plain English
prompt-improver
Extends the system's self-correction loop. Given a skill slug, reads its SKILL.md + AGENTS.md + recent eval failures, then calls the LLM to compose a small prompt fragment patch. Returns {old, proposed, reason} for the caller to emit a before/after Apply/Discard card.
Per-skill prompt fragments land at state/skills/<slug>/prompt-fragment.md. Server injection of these fragments is Phase 2 work - writing the file is the current artifact.
Apply/Discard events from the UI write eval rows back to state/log/evals.ndjson with skill: "prompt-improver" and kind: "improvement-feedback", feeding the prompt-tuner reinforcement loop.
Steps
- Accept a skill
slugargument. - Read
state/skills/<slug>/SKILL.mdandstate/skills/<slug>/AGENTS.md. - Tail
state/log/evals.ndjsonfor recent rows whereskill === slugandscore < 0.7. - Call the LLM (Sonnet) with the skill prose + failure context, requesting a focused prompt patch.
- Return
{slug, old, proposed, reason}to the caller. - Caller emits a before/after card. User clicks Apply or Discard.
- On Apply: write
state/skills/<slug>/prompt-fragment.mdwith the proposed patch. Write eval row{skill: "prompt-improver", target: slug, score: 0.9, kind: "improvement-feedback"}. - On Discard: write eval row
{skill: "prompt-improver", target: slug, score: 0.3, kind: "improvement-feedback"}.
AGENTS.md- what the AI loads when this skill comes up
prompt-improver - loader
Per-turn rules. Full reference: state/skills/prompt-improver/SKILL.md.
Critical Rules
- Slug required. Always pass an explicit skill slug. Never guess or infer - read it from the caller's context.
- Tail eval rows for the target slug. Read
state/log/evals.ndjson, filter forskill === slug(ortarget === slug) wherescore < 0.7. Use last 50 rows max. - LLM call is required. Do not return a patch without calling the LLM (Sonnet). The patch must be grounded in the actual prose failures, not fabricated.
- Return shape is
{slug, old, proposed, reason}. All four fields required.old= the current problematic fragment verbatim.proposed= the replacement.reason= one-sentence explanation of why this change addresses the observed failures. - Apply/Discard writes eval rows. Apply →
score: 0.9, Discard →score: 0.3. Both useskill: "prompt-improver",target: <slug>,kind: "improvement-feedback". These rows feed the prompt-tuner reinforcement digest. - No side-effects on Discard. On Discard, only write the eval row - do NOT write the prompt-fragment.md file.
Commands
# CLI usage
npx tsx state/lib/prompt-improver.ts <slug>
# Read recent failures for a slug
grep '"skill":"<slug>"' state/log/evals.ndjson | python3 -c "import sys,json; rows=[json.loads(l) for l in sys.stdin]; bad=[r for r in rows if r.get('score',1)<0.7]; print(len(bad), 'failures')"
# Check if prompt-fragment exists
ls state/skills/<slug>/prompt-fragment.md 2>/dev/null || echo "no fragment yet"
Self-correcting loader (PID feedback)
Before you finish, do two things:
1. Fix gaps (P - proportional). MANDATORY. If this loader didn't cover your case, edit this file surgically.
2. Log the result.
echo "[$(date -u +%FT%TZ)] prompt-improver: <what was missing or fixed> [FIXED|LOGGED] action_kind=<kind>" >> state/log/loader-feedback.logapi.ts- the code it can call
#!/usr/bin/env npx tsx
/**
* snappy-prompt-improver/api.ts -- Generate a prompt patch for a skill based on eval failures.
*
* The system extends itself: given a skill slug, read its SKILL.md + AGENTS.md + recent
* eval failures, then use the configured snappy-os dispatch runtime to compose
* a small prompt fragment patch.
* Returns {old, proposed, reason} for the caller to emit a before/after card.
*
* Per-skill prompt fragments live at: state/skills/<slug>/prompt-fragment.md
* (new convention). Server does not auto-load these yet — writing the file is
* the artifact; Phase 2 will wire server injection.
*
* Usage (import):
* import { generatePromptPatch } from "./prompt-improver.ts";
* const patch = await generatePromptPatch("dogfood-loop");
*
* Usage (CLI):
* npx tsx state/lib/prompt-improver.ts <slug>
*/
import { readFileSync, existsSync, readdirSync } from "fs";
import { join, resolve, dirname } from "path";
import { fileURLToPath } from "url";
import { dispatchFor, readDefaultModel, readDispatchConfig } from "./dispatch.ts";
const HERE = dirname(fileURLToPath(import.meta.url));
const REPO_ROOT = resolve(HERE, "..", "..");
export interface PromptPatch {
slug: string;
old: string;
proposed: string;
reason: string;
}
interface EvalRow {
ts: string;
skill?: string;
verb?: string;
score: number;
primary_issue?: string | null;
note?: string | null;
}
// --- Read helpers ---
function readSkillMd(slug: string): string {
const path = join(REPO_ROOT, "state/skills", slug, "SKILL.md");
if (!existsSync(path)) throw new Error(`SKILL.md not found for slug: ${slug}`);
return readFileSync(path, "utf8");
}
function readAgentsMd(slug: string): string {
const path = join(REPO_ROOT, "state/skills", slug, "AGENTS.md");
if (!existsSync(path)) return "";
return readFileSync(path, "utf8");
}
function readPromptFragment(slug: string): string {
const path = join(REPO_ROOT, "state/skills", slug, "prompt-fragment.md");
if (!existsSync(path)) return "";
return readFileSync(path, "utf8");
}
function readRecentFailures(slug: string, limit = 20): EvalRow[] {
const evalsPath = join(REPO_ROOT, "state/log/evals.ndjson");
if (!existsSync(evalsPath)) return [];
const lines = readFileSync(evalsPath, "utf8").trim().split("\n").filter(Boolean);
// Take tail-200 to avoid full-file scan
const tail = lines.slice(-200);
const failures: EvalRow[] = [];
for (const line of tail) {
try {
const row = JSON.parse(line) as EvalRow;
const rowSlug = row.skill || row.verb || "";
if (rowSlug !== slug) continue;
if (typeof row.score !== "number") continue;
if (row.score < 0.5) failures.push(row);
} catch {
// skip malformed lines
}
}
// Return most recent failures first, up to limit
return failures.slice(-limit).reverse();
}
function summarizeFailureMode(failures: EvalRow[]): string {
if (failures.length === 0) return "No recent failures found (fewer than 20 eval rows with score < 0.5).";
// Count primary_issue occurrences
const issueCounts: Record<string, number> = {};
for (const f of failures) {
const issue = (f.primary_issue || "").trim();
if (issue) {
issueCounts[issue] = (issueCounts[issue] || 0) + 1;
}
}
const topIssues = Object.entries(issueCounts)
.sort((a, b) => b[1] - a[1])
.slice(0, 3)
.map(([issue, count]) => `"${issue}" (${count}x)`);
const noteSnippets = failures
.slice(0, 3)
.map(f => f.note)
.filter(Boolean)
.map(n => `- ${(n || "").slice(0, 120)}`);
let summary = `${failures.length} failures in last 200 evals (score < 0.5).`;
if (topIssues.length) summary += ` Top issues: ${topIssues.join(", ")}.`;
if (noteSnippets.length) summary += `\nRecent notes:\n${noteSnippets.join("\n")}`;
return summary;
}
// --- LLM calls ---
const PROMPT_IMPROVER_SYSTEM = `You write concise, surgical prompt fragments for snappy-os skills.
A "prompt fragment" is a short block of text (typically 3-15 lines) that gets prepended to a skill agent's system prompt to address observed failure modes. It is NOT a full system prompt rewrite — it is a targeted patch.
WHAT YOU ARE GIVEN:
- The skill's SKILL.md (its purpose and behavior spec)
- Its current prompt-fragment.md (empty string if none exists yet)
- Its recent eval failures (score < 0.5): primary_issue + notes
YOUR JOB:
Write a NEW prompt-fragment.md that either:
a) Creates a new fragment (if none exists) that directly addresses the top failure modes.
b) Patches the existing fragment to fix the gaps the failures reveal.
FORMAT RULES:
- Start with a single-line header: # <slug> — prompt fragment
- Then 2-10 bullet points or short rules. No paragraphs.
- Each bullet addresses one specific failure mode, stated as a DO or DO NOT rule.
- Be direct and prescriptive, not abstract.
- No greetings, no "This fragment...", no meta-commentary.
- Under 200 words total.
- Output ONLY the fragment text. No explanation, no fences.`;
function buildUserPrompt(slug: string, skillBody: string, agentsBody: string, oldFragment: string, failures: EvalRow[]): string {
const failureLines = failures.slice(0, 10).map(f => {
const parts = [`score=${f.score}`];
if (f.primary_issue) parts.push(`issue="${f.primary_issue}"`);
if (f.note) parts.push(`note="${(f.note || "").slice(0, 100)}"`);
return `- ${parts.join(" ")}`;
}).join("\n") || "- (no failures found in recent evals)";
return `SKILL SLUG: ${slug}
SKILL.md (truncated to 2000 chars):
${skillBody.slice(0, 2000)}
AGENTS.md (first 800 chars, if any):
${agentsBody.slice(0, 800) || "(no AGENTS.md)"}
CURRENT PROMPT FRAGMENT (empty = none exists):
${oldFragment || "(none)"}
RECENT FAILURES (last ${failures.length}, score < 0.5):
${failureLines}
Write the new prompt-fragment.md that addresses these failures. Output ONLY the fragment text.`;
}
function normalizeFragment(output: string): string {
const trimmed = output.trim();
const fenceMatch = trimmed.match(/```(?:markdown|md)?\s*([\s\S]*?)```/i);
return (fenceMatch?.[1] || trimmed).trim();
}
async function callConfiguredRuntime(slug: string, systemPrompt: string, userPrompt: string): Promise<string> {
const axis = readDispatchConfig().subagent;
const modelLabel = axis.model === "auto" ? readDefaultModel().slug : axis.model;
process.stderr.write(`[prompt-improver] composing ${slug} via ${axis.backend}/${modelLabel}\n`);
const result = await dispatchFor("subagent", {
prompt: userPrompt,
systemPrompt,
cwd: REPO_ROOT,
tools: ["read", "grep", "ls"],
timeoutMs: 120_000,
interviewMode: false,
});
if (!result.ok || !result.output.trim()) {
const detail = result.error || result.stderr || `exit ${result.exitCode}`;
throw new Error(`${result.provider}/${result.model} failed: ${detail}`);
}
return normalizeFragment(result.output);
}
// --- Main export ---
export async function generatePromptPatch(slug: string): Promise<PromptPatch> {
const skillMd = readSkillMd(slug); // throws if not found
const agentsMd = readAgentsMd(slug);
const oldFragment = readPromptFragment(slug);
const failures = readRecentFailures(slug);
const reason = summarizeFailureMode(failures);
// Strip frontmatter from SKILL.md to get body prose
const skillBody = skillMd.replace(/^---\n[\s\S]+?\n---\n?/, "").trim();
const userPrompt = buildUserPrompt(slug, skillBody, agentsMd, oldFragment, failures);
let proposed: string;
try {
proposed = await callConfiguredRuntime(slug, PROMPT_IMPROVER_SYSTEM, userPrompt);
} catch (e) {
throw new Error(`prompt runtime failed for ${slug}: ${(e as Error).message}`);
}
// Basic sanity: must start with # or bullet
if (!proposed || (!proposed.startsWith("#") && !proposed.startsWith("-") && !proposed.startsWith("*"))) {
process.stderr.write(`[prompt-improver] LLM output for ${slug} looks off, using as-is\n`);
}
return { slug, old: oldFragment, proposed, reason };
}
// --- CLI ---
const isMain = (() => {
try {
return import.meta.url === `file://${require("fs").realpathSync(process.argv[1])}`;
} catch {
try {
return import.meta.url === `file://${process.argv[1]}`;
} catch {
return false;
}
}
})();
if (isMain) {
(async () => {
const slug = process.argv[2];
if (!slug) {
console.error("Usage: npx tsx state/lib/prompt-improver.ts <slug>");
process.exit(1);
}
try {
const patch = await generatePromptPatch(slug);
console.log("=== OLD FRAGMENT ===");
console.log(patch.old || "(none)");
console.log("\n=== PROPOSED ===");
console.log(patch.proposed);
console.log("\n=== REASON ===");
console.log(patch.reason);
} catch (e) {
console.error("Error:", (e as Error).message);
process.exit(1);
}
})();
}
scripts- helper scripts it can run
prose-only skill - no sidecar under state/bin/ yet. Steps, if any, are described in SKILL.md.
how we check it- the checks, plus the last 3 runs
| timestamp | verb | score | primary_issue | artifact |
|---|---|---|---|---|
| 2026-05-01 06:08Z | - | 0.90 | - | - |
| 2026-05-01 06:08Z | - | 0.90 | - | - |
| 2026-05-01 06:08Z | - | 0.90 | - | - |