Shared eval endpoint

The contract state/lib/eval.ts and state/bin/pid-detect.ts speak when SNAPPY_EVAL_ENDPOINT is set. Point it at any HTTP service (Xano is the reference target) that honors this shape and you get a multi-machine PID loop for free.

Why it exists

Each machine runs its own skills and writes state/log/evals.ndjson locally. That ndjson is gitignored — it's per-machine ephemeral state. Without a shared destination, each machine only sees its own slice of history, and pid-detect can't find trends that span devices.

The endpoint is the opposite of git-based sync: append-only, eventually consistent, and offline-tolerant. Local ndjson remains authoritative; the endpoint is additive aggregation.

Environment

VarPurpose
SNAPPY_EVAL_ENDPOINTBase URL, e.g. https://xano.snappy.ai/api:evals. Trailing slash trimmed. Unset = disabled, local-only mode (identical to pre-Phase-4 behavior).
SNAPPY_EVAL_TOKENOptional bearer token. Sent as Authorization: Bearer <token> on both POST and GET.

POST /eval

Append a single eval row. Called from state/lib/eval.ts#score() in fire-and-forget mode — the caller never awaits the response, so a network hiccup cannot block a skill run.

Request body (application/json, one row):

{
  "ts": "2026-04-16T19:42:00.000Z",
  "run_id": "a1b2c3d4e5f6",
  "skill": "content-polish",
  "score": 1.0,
  "mode": "auto",
  "primary_issue": null,
  "fix_applied": false,
  "notes": "",
  "host": "mac-mini-1"
}

Additional fields are caller-defined and should be preserved by the endpoint (Xano: use a json column extra). The four required fields are ts, run_id, skill, score.

Response: 200 OK with any body. The client ignores it.

Semantics:

stored once. Implementation choice: unique index or upsert-on-conflict.

Manual cleanup is fine, but not a regular operation.

GET /evals

Read recent eval rows for trend analysis. Called from state/lib/eval.ts#fetchEvals() which is awaited by state/bin/pid-detect.ts before scanning for regen candidates.

Query params:

ParamTypePurpose
skillstring, optionalFilter to a single skill.
daysint, optionalOnly rows with ts >= now() - days. Default: unlimited.
limitint, optionalCap on rows returned. Default: implementation-defined (recommend 5000).

Response (either shape is accepted):

{
  "rows": [
    { "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... },
    ...
  ]
}

or the bare array:

[
  { "ts": "...", "run_id": "...", "skill": "...", "score": 1.0, ... }
]

Ordering is unspecified — pid-detect sorts by ts itself.

Failure modes

The client treats every failure (non-2xx, network error, malformed JSON) as "endpoint unavailable" and silently falls back to local ndjson. No throws escape score() or fetchEvals(). This is by design: a dead endpoint must never prevent a skill from running or a PID scan from surfacing local regressions.

Deployment modes

ModeSNAPPY_EVAL_TOKENEndpoint authWhen to use
PersonalunsetopenSingle-operator setups — the default for our install. Trust boundary is your Xano workspace.
Multi-tenant / publishedrequiredbearerAnyone standing up snappy-os for others. Set SNAPPY_EVAL_TOKEN on every producing machine and configure the Xano API group to require it.

The live lint (npm run contract:live) warns when no token is configured, so the deployed mode is always visible. The reference server honors the token automatically — unauthenticated requests get 401 whenever SNAPPY_EVAL_TOKEN is set in its environment.

Portable reference server

For local development or self-hosting off Xano, state/bin/eval-server/server.ts is a zero-dependency Node implementation of the contract:

npm run eval-server                   # listens on 127.0.0.1 (ephemeral port)
PORT=4717 npm run eval-server         # pinned port
SNAPPY_EVAL_TOKEN=secret npm run eval-server    # bearer auth

Persistence is in-memory — swap the rows array for SQLite/Postgres to make it durable. The contract lint (npm run contract:stub) runs against this same server, so the same implementation that documents the contract also verifies client behavior.

Xano reference implementation

Suggested table eval_rows:

ColumnTypeNotes
idint, PK
tstimestampindex
run_idtextindex
skilltextindex
scoredecimal
modetextnullable
primary_issuetextnullable
fix_appliedbooldefault false
notestextnullable
hosttextnullable
extrajsoneverything else in the body

Unique index: (run_id, skill, ts).

Endpoints:

{ ok: true }.

default limit 5000, return { rows: [...] }.

Auth: optional bearer on an API group — set SNAPPY_EVAL_TOKEN to the issued token and the client appends the header automatically.

Wiring a machine

One-time provisioning (any machine, run once)

npx tsx state/bin/provision-eval-endpoint.ts \
    --workspace 5 \
    --group "Snappy OS ⚡"

This is idempotent — it creates the eval_rows table, reconciles any missing fields, creates the API group, and creates the two endpoint shells. Running it twice is a no-op.

Deploying the function stacks

The canonical xanoscript lives in the repo:

Redeploy after editing either .xs file:

npx tsx state/bin/deploy-eval-endpoint.ts            # push both
npx tsx state/bin/deploy-eval-endpoint.ts --only get # push one
npx tsx state/bin/deploy-eval-endpoint.ts --dry      # scope only

The script resolves the endpoint ids by name inside API group 1639 (Snappy OS ⚡) and pushes via the raw Xano Metadata API (PUT /api:meta/workspace/{ws}/apigroup/{group}/api/{id} with xanoscript in the body). No xano-mcp round-trip, no editor click. To target a different install pass --workspace N --group-id M.

Notes for anyone authoring new stacks against eval_rows:

operator — (($x|count) > 0), not ($x|count > 0). Xano returns "Invalid syntax. Please wrap your filter with parentheses." otherwise.

with "Not numeric." Use now|add_secs_to_timestamp:$offset (seconds, negative for backward) and compare against the int created_at column, not the ts timestamp column.

body also includes verb. GET-back surfaces the script as null in the main response — read via the Xano editor (or preserve the source in git, which is what the .xs files are for).

Wiring this machine

echo 'SNAPPY_EVAL_ENDPOINT=https://xnwv-v1z6-dvnr.n7c.xano.io/api:o8IHA3To' \
    >> .env.cache
# SNAPPY_EVAL_TOKEN=... optional

state/lib/eval.ts reads the endpoint from process.env first and then falls back to .env.cache via env("SNAPPY_EVAL_ENDPOINT", false), so either channel works.

Smoke test

# 1. Run any skill that calls score() — or hand-fire one:
npx tsx -e '
import { score } from "./state/lib/eval.ts";
import { newRunId } from "./state/lib/log.ts";
score("smoke-test", newRunId(), { score: 1.0, primary_issue: null });
'

# 2. Check Xano → eval_rows — the row should appear within a second.

# 3. Run pid-detect with remote data:
npx tsx state/bin/pid-detect.ts --stats

No machine-level coordination required. Every machine POSTs; pid-detect on any machine sees the whole picture.