Shared eval endpoint
The contract state/lib/eval.ts and state/bin/pid-detect.ts speak when SNAPPY_EVAL_ENDPOINT is set. Point it at any HTTP service (Xano is the reference target) that honors this shape and you get a multi-machine PID loop for free.
Why it exists
Each machine runs its own skills and writes state/log/evals.ndjson locally. That ndjson is gitignored — it's per-machine ephemeral state. Without a shared destination, each machine only sees its own slice of history, and pid-detect can't find trends that span devices.
The endpoint is the opposite of git-based sync: append-only, eventually consistent, and offline-tolerant. Local ndjson remains authoritative; the endpoint is additive aggregation.
Environment
| Var | Purpose |
|---|---|
SNAPPY_EVAL_ENDPOINT | Base URL, e.g. https://xano.snappy.ai/api:evals. Trailing slash trimmed. Unset = disabled, local-only mode (identical to pre-Phase-4 behavior). |
SNAPPY_EVAL_TOKEN | Optional bearer token. Sent as Authorization: Bearer <token> on both POST and GET. |
POST /eval
Append a single eval row. Called from state/lib/eval.ts#score() in fire-and-forget mode — the caller never awaits the response, so a network hiccup cannot block a skill run.
Request body (application/json, one row):
{
"ts": "2026-04-16T19:42:00.000Z",
"run_id": "a1b2c3d4e5f6",
"skill": "content-polish",
"score": 1.0,
"mode": "auto",
"primary_issue": null,
"fix_applied": false,
"notes": "",
"host": "mac-mini-1"
}
Additional fields are caller-defined and should be preserved by the endpoint (Xano: use a json column extra). The four required fields are ts, run_id, skill, score.
Response: 200 OK with any body. The client ignores it.
Semantics:
- Idempotent by
(run_id, skill, ts)— a row inserted twice must be
stored once. Implementation choice: unique index or upsert-on-conflict.
- Append-only — rows are never edited or deleted through this endpoint.
Manual cleanup is fine, but not a regular operation.
GET /evals
Read recent eval rows for trend analysis. Called from state/lib/eval.ts#fetchEvals() which is awaited by state/bin/pid-detect.ts before scanning for regen candidates.
Query params:
| Param | Type | Purpose |
|---|---|---|
skill | string, optional | Filter to a single skill. |
days | int, optional | Only rows with ts >= now() - days. Default: unlimited. |
limit | int, optional | Cap on rows returned. Default: implementation-defined (recommend 5000). |
Response (either shape is accepted):
{
"rows": [
{ "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... },
...
]
}
or the bare array:
[
{ "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... }
]
Ordering is unspecified — pid-detect sorts by ts itself.
Failure modes
The client treats every failure (non-2xx, network error, malformed JSON) as "endpoint unavailable" and silently falls back to local ndjson. No throws escape score() or fetchEvals(). This is by design: a dead endpoint must never prevent a skill from running or a PID scan from surfacing local regressions.
Deployment modes
| Mode | SNAPPY_EVAL_TOKEN | Endpoint auth | When to use |
|---|---|---|---|
| Personal | unset | open | Single-operator setups — the default for our install. Trust boundary is your Xano workspace. |
| Multi-tenant / published | required | bearer | Anyone standing up snappy-os for others. Set SNAPPY_EVAL_TOKEN on every producing machine and configure the Xano API group to require it. |
The live lint (npm run contract:live) warns when no token is configured, so the deployed mode is always visible. The reference server honors the token automatically — unauthenticated requests get 401 whenever SNAPPY_EVAL_TOKEN is set in its environment.
Portable reference server
For local development or self-hosting off Xano, state/bin/eval-server/server.ts is a zero-dependency Node implementation of the contract:
npm run eval-server # listens on 127.0.0.1 (ephemeral port)
PORT=4717 npm run eval-server # pinned port
SNAPPY_EVAL_TOKEN=secret npm run eval-server # bearer auth
Persistence is in-memory — swap the rows array for SQLite/Postgres to make it durable. The contract lint (npm run contract:stub) runs against this same server, so the same implementation that documents the contract also verifies client behavior.
Xano reference implementation
Suggested table eval_rows:
| Column | Type | Notes |
|---|---|---|
id | int, PK | |
ts | timestamp | index |
run_id | text | index |
skill | text | index |
score | decimal | |
mode | text | nullable |
primary_issue | text | nullable |
fix_applied | bool | default false |
notes | text | nullable |
host | text | nullable |
extra | json | everything else in the body |
Unique index: (run_id, skill, ts).
Endpoints:
POST /api:evals/eval→ insert-or-skip on duplicate key, return
{ ok: true }.
GET /api:evals/evals?skill=&days=&limit=→ filter + order by ts desc,
default limit 5000, return { rows: [...] }.
Auth: optional bearer on an API group — set SNAPPY_EVAL_TOKEN to the issued token and the client appends the header automatically.
Wiring a machine
One-time provisioning (any machine, run once)
npx tsx state/bin/provision-eval-endpoint.ts \
--workspace 5 \
--group "Snappy OS ⚡"
This is idempotent — it creates the eval_rows table, reconciles any missing fields, creates the API group, and creates the two endpoint shells. Running it twice is a no-op.
Deploying the function stacks
The canonical xanoscript lives in the repo:
state/xano/eval/post-eval.xs— POST /eval (idempotent on run_id+skill+ts)state/xano/eval/get-evals.xs— GET /evals (skill/days/limit filters)
Redeploy after editing either .xs file:
npx tsx state/bin/deploy-eval-endpoint.ts # push both
npx tsx state/bin/deploy-eval-endpoint.ts --only get # push one
npx tsx state/bin/deploy-eval-endpoint.ts --dry # scope only
The script resolves the endpoint ids by name inside API group 1639 (Snappy OS ⚡) and pushes via the raw Xano Metadata API (PUT /api:meta/workspace/{ws}/apigroup/{group}/api/{id} with xanoscript in the body). No xano-mcp round-trip, no editor click. To target a different install pass --workspace N --group-id M.
Notes for anyone authoring new stacks against eval_rows:
|countand similar filters must be parenthesised before a comparison
operator — (($x|count) > 0), not ($x|count > 0). Xano returns "Invalid syntax. Please wrap your filter with parentheses." otherwise.
nowis a timestamp type, not a raw int.now - (days * 86400000)dies
with "Not numeric." Use now|add_secs_to_timestamp:$offset (seconds, negative for backward) and compare against the int created_at column, not the ts timestamp column.
- The Metadata API accepts
xanoscriptonPUT /api/{id}as long as the
body also includes verb. GET-back surfaces the script as null in the main response — read via the Xano editor (or preserve the source in git, which is what the .xs files are for).
Wiring this machine
echo 'SNAPPY_EVAL_ENDPOINT=https://xnwv-v1z6-dvnr.n7c.xano.io/api:o8IHA3To' \
>> .env.cache
# SNAPPY_EVAL_TOKEN=... optional
state/lib/eval.ts reads the endpoint from process.env first and then falls back to .env.cache via env("SNAPPY_EVAL_ENDPOINT", false), so either channel works.
Smoke test
# 1. Run any skill that calls score() — or hand-fire one:
npx tsx -e '
import { score } from "./state/lib/eval.ts";
import { newRunId } from "./state/lib/log.ts";
score("smoke-test", newRunId(), { score: 1.0, primary_issue: null });
'
# 2. Check Xano → eval_rows — the row should appear within a second.
# 3. Run pid-detect with remote data:
npx tsx state/bin/pid-detect.ts --stats
No machine-level coordination required. Every machine POSTs; pid-detect on any machine sees the whole picture.