Shared eval endpoint

The contract state/lib/eval.ts and state/bin/pid-detect.ts speak when SNAPPY_EVAL_ENDPOINT is set. Point it at any HTTP service (Xano is the reference target) that honors this shape and you get a multi-machine PID loop for free.

Why it exists

Each machine runs its own skills and writes state/log/evals.ndjson locally. That ndjson is gitignored — it's per-machine ephemeral state. Without a shared destination, each machine only sees its own slice of history, and pid-detect can't find trends that span devices.

The endpoint is the opposite of git-based sync: append-only, eventually consistent, and offline-tolerant. Local ndjson remains authoritative; the endpoint is additive aggregation.

Environment

Var	Purpose
`SNAPPY_EVAL_ENDPOINT`	Base URL, e.g. `https://xano.snappy.ai/api:evals`. Trailing slash trimmed. Unset = disabled, local-only mode (identical to pre-Phase-4 behavior).
`SNAPPY_EVAL_TOKEN`	Optional bearer token. Sent as `Authorization: Bearer <token>` on both POST and GET.

POST /eval

Append a single eval row. Called from state/lib/eval.ts#score() in fire-and-forget mode — the caller never awaits the response, so a network hiccup cannot block a skill run.

Request body (application/json, one row):

{
  "ts": "2026-04-16T19:42:00.000Z",
  "run_id": "a1b2c3d4e5f6",
  "skill": "content-polish",
  "score": 1.0,
  "mode": "auto",
  "primary_issue": null,
  "fix_applied": false,
  "notes": "",
  "host": "mac-mini-1"
}

Additional fields are caller-defined and should be preserved by the endpoint (Xano: use a json column extra). The four required fields are ts, run_id, skill, score.

Response: 200 OK with any body. The client ignores it.

Semantics:

Idempotent by (run_id, skill, ts) — a row inserted twice must be

stored once. Implementation choice: unique index or upsert-on-conflict.

Append-only — rows are never edited or deleted through this endpoint.

Manual cleanup is fine, but not a regular operation.

GET /evals

Read recent eval rows for trend analysis. Called from state/lib/eval.ts#fetchEvals() which is awaited by state/bin/pid-detect.ts before scanning for regen candidates.

Query params:

Param	Type	Purpose
`skill`	string, optional	Filter to a single skill.
`days`	int, optional	Only rows with `ts >= now() - days`. Default: unlimited.
`limit`	int, optional	Cap on rows returned. Default: implementation-defined (recommend 5000).

Response (either shape is accepted):

{
  "rows": [
    { "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... },
    ...
  ]
}

or the bare array:

[
  { "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... }
]

Ordering is unspecified — pid-detect sorts by ts itself.

Failure modes

The client treats every failure (non-2xx, network error, malformed JSON) as "endpoint unavailable" and silently falls back to local ndjson. No throws escape score() or fetchEvals(). This is by design: a dead endpoint must never prevent a skill from running or a PID scan from surfacing local regressions.

Deployment modes

Mode	`SNAPPY_EVAL_TOKEN`	Endpoint auth	When to use
Personal	unset	open	Single-operator setups — the default for our install. Trust boundary is your Xano workspace.
Multi-tenant / published	required	bearer	Anyone standing up snappy-os for others. Set `SNAPPY_EVAL_TOKEN` on every producing machine and configure the Xano API group to require it.

The live lint (npm run contract:live) warns when no token is configured, so the deployed mode is always visible. The reference server honors the token automatically — unauthenticated requests get 401 whenever SNAPPY_EVAL_TOKEN is set in its environment.

Portable reference server

For local development or self-hosting off Xano, state/bin/eval-server/server.ts is a zero-dependency Node implementation of the contract:

npm run eval-server                   # listens on 127.0.0.1 (ephemeral port)
PORT=4717 npm run eval-server         # pinned port
SNAPPY_EVAL_TOKEN=secret npm run eval-server    # bearer auth

Persistence is in-memory — swap the rows array for SQLite/Postgres to make it durable. The contract lint (npm run contract:stub) runs against this same server, so the same implementation that documents the contract also verifies client behavior.

Xano reference implementation

Suggested table eval_rows:

Column	Type	Notes
`id`	int, PK
`ts`	timestamp	index
`run_id`	text	index
`skill`	text	index
`score`	decimal
`mode`	text	nullable
`primary_issue`	text	nullable
`fix_applied`	bool	default false
`notes`	text	nullable
`host`	text	nullable
`extra`	json	everything else in the body

Unique index: (run_id, skill, ts).

Endpoints:

POST /api:evals/eval → insert-or-skip on duplicate key, return

{ ok: true }.

GET /api:evals/evals?skill=&days=&limit= → filter + order by ts desc,

default limit 5000, return { rows: [...] }.

Auth: optional bearer on an API group — set SNAPPY_EVAL_TOKEN to the issued token and the client appends the header automatically.

Wiring a machine

One-time provisioning (any machine, run once)

npx tsx state/bin/provision-eval-endpoint.ts \
    --workspace 5 \
    --group "Snappy OS ⚡"

This is idempotent — it creates the eval_rows table, reconciles any missing fields, creates the API group, and creates the two endpoint shells. Running it twice is a no-op.

Deploying the function stacks

The canonical xanoscript lives in the repo:

state/xano/eval/post-eval.xs — POST /eval (idempotent on run_id+skill+ts)
state/xano/eval/get-evals.xs — GET /evals (skill/days/limit filters)

Redeploy after editing either .xs file:

npx tsx state/bin/deploy-eval-endpoint.ts            # push both
npx tsx state/bin/deploy-eval-endpoint.ts --only get # push one
npx tsx state/bin/deploy-eval-endpoint.ts --dry      # scope only

The script resolves the endpoint ids by name inside API group 1639 (Snappy OS ⚡) and pushes via the raw Xano Metadata API (PUT /api:meta/workspace/{ws}/apigroup/{group}/api/{id} with xanoscript in the body). No xano-mcp round-trip, no editor click. To target a different install pass --workspace N --group-id M.

Notes for anyone authoring new stacks against eval_rows:

|count and similar filters must be parenthesised before a comparison

operator — (($x|count) > 0), not ($x|count > 0). Xano returns "Invalid syntax. Please wrap your filter with parentheses." otherwise.

now is a timestamp type, not a raw int. now - (days * 86400000) dies

with "Not numeric." Use now|add_secs_to_timestamp:$offset (seconds, negative for backward) and compare against the int created_at column, not the ts timestamp column.

The Metadata API accepts xanoscript on PUT /api/{id} as long as the

body also includes verb. GET-back surfaces the script as null in the main response — read via the Xano editor (or preserve the source in git, which is what the .xs files are for).

Wiring this machine

echo 'SNAPPY_EVAL_ENDPOINT=https://xnwv-v1z6-dvnr.n7c.xano.io/api:o8IHA3To' \
    >> .env.cache
# SNAPPY_EVAL_TOKEN=... optional

state/lib/eval.ts reads the endpoint from process.env first and then falls back to .env.cache via env("SNAPPY_EVAL_ENDPOINT", false), so either channel works.

Smoke test

# 1. Run any skill that calls score() — or hand-fire one:
npx tsx -e '
import { score } from "./state/lib/eval.ts";
import { newRunId } from "./state/lib/log.ts";
score("smoke-test", newRunId(), { score: 1.0, primary_issue: null });
'

# 2. Check Xano → eval_rows — the row should appear within a second.

# 3. Run pid-detect with remote data:
npx tsx state/bin/pid-detect.ts --stats

No machine-level coordination required. Every machine POSTs; pid-detect on any machine sees the whole picture.