Schedule architecture audit — 2026-04-18
Robert's ask: "the agents being scheduled and the cron and all of that and overall the architecture too needs to be elegant and simple." This page is the end-to-end inventory, the grading, and a single consolidating move.
1. Inventory of dispatch paths
Eight distinct paths reach claude -p (or its equivalent) on this machine. Column order: path → purpose → trigger → lock → log.
| # | Path | Purpose | Trigger | Lock | Log |
|---|---|---|---|---|---|
| 1 | ~/robot-rob/claude-cron.sh <job> | One-shot Claude headless by skill name. Clocks in/out to Xano, routes notify. | Native crontab (9 live entries: memory-consolidation, skool-daily-post, linkedin-post-scraper, polish-digest, meeting-followup, linkedin-likes-only, ray-update, scrape-competitor-pricing-weekly, plus ad-hoc via byline/ctl.sh run) | /tmp/claude-cron-<job>.lock | ~/robot-rob/logs/claude-cron.log + ~/robot-rob/logs/claude-cron-<job>.<ts>.log |
| 2 | state/bin/agents/tick.sh | Long-running multi-turn agents. Reads state/agents/*.json, fires one tick per status=running agent per 10 min (cap 3). | Native crontab */10 * * * * | /tmp/snappy-agent-<id>.lock | ~/robot-rob/logs/claude-cron.log + ~/robot-rob/logs/claude-cron-agent-<id>.<ts>.log |
| 3 | state/bin/autopilot/break.sh | Breaker pod. Files findings to breakage-report.ndjson. | Native crontab (PAUSED as of 2026-04-18). Also manual via byline run. | none | ~/.claude/logs/snappy-os-breaker.log |
| 4 | state/bin/autopilot/fix.sh | Fixer pod. Reads breakage report, fixes one row, commits. | Native crontab (PAUSED). Also manual via byline run. | none | ~/.claude/logs/snappy-os-fixer.log |
| 5 | state/bin/autopilot/regen.sh | PID-loop re-test gate after skill rewrite. | Called by auto-regen.sh stop-hook (not cron). | none | stdout |
| 6 | state/bin/meeting-triage/watch.sh | 60-sec commitment surfacing after Krisp hangup. run.ts is idempotent; wrapper exits after one run. | Native crontab */2 * * * * | none (idempotency via followups.ndjson dedupe) | ~/robot-rob/logs/meeting-triage.log |
| 7 | state/bin/browser/keepalive.sh | Agent-browser session refresh. | Native crontab 0 */12 * * * | none | ~/.claude/logs/browser-keepalive.log |
| 8 | state/bin/ray-update/friday.sh | Friday auto-draft for Ray. | Native crontab 0 18 * * 5 | none | ~/robot-rob/logs/ray-update-friday.log |
Plus three non-Claude utility crons (lint eval-coverage every 30m, snappy-os doctor every 6h, ray-sheet Freshbooks refresh daily) and one MISSING script:
| # | Path | Status |
|---|---|---|
| — | state/bin/inbox-sweep/linkedin-sweep-cron.sh (crontab line 7,9,11,13,15,17,19) | DEAD. Neither state/bin/inbox-sweep/ nor ~/.claude/skills/snappy-inbox-sweep/linkedin-sweep-cron.sh exists. Dashboard flags it as SCRIPT MISSING. |
| — | snappy-os doctor --silent (crontab 30 */6) | CLI snappy-os is installed; fine. |
Worker-side scheduled handlers (cf-worker lives in separate repo ~/projects/snappy-skills/, verified via worker-architecture.md):
* * * * * → runQuorumPromotion(env) (never promoted — needs ≥3 tenants)
*/5 * * * * → rebuildCatalog(env); syncSkoolThreads(env)
Status probe at https://skills.snappy.ai/_status returns worker_version: 0.4.2, tenants_total: 0, do_spaces_reachable: true, last_quorum_promotion: null.
2. The two parallel "agent" systems
The single biggest elegance debt on this machine: there are two conceptually different "agents" with parallel pause/resume/list surfaces.
System A — byline cron catalog (state/bin/byline/agents.sh): 12 agents defined as pipe-delimited rows (abbr, emoji, pretty name, cron label, run command, cadence, optional live log). Pause semantics: append label to state/bin/byline/paused.txt; the cron reads the file and skips. Controls: /snappy-run, /snappy-pause (via byline ctl.sh — different codepath than the skill name implies).
System B — long-running multi-turn agents (state/agents/*.json + state/bin/agents/ctl.ts): go/pause/resume/stop/list verbs write JSON state files; tick.sh reads them each 10 min. Pause semantics: set status=paused in the JSON file.
Skills /snappy-pause, /snappy-resume, /snappy-list are wired exclusively to System B. /snappy-run is wired to System A. Operator-facing pages do NOT explain this split, so the skill loader for snappy-pause says "pauses the default agent" while the operator probably means "pause the content-mine cron." That mismatch is a footgun.
3. Lockfile convention
Three patterns in use across one repo:
| Pattern | Used by |
|---|---|
/tmp/claude-cron-<label>.lock | claude-cron.sh (9 agents) |
/tmp/snappy-agent-<id>.lock | agents/tick.sh (multi-turn) |
/tmp/snappy-run-<label>-<ts>.log (log, not lock) | /snappy-run spawn |
| (no lock) | autopilot break/fix, meeting-triage, browser keepalive, ray-update-friday |
Four dispatch paths have no lockfile at all. Collision protection is either documented-idempotent (meeting-triage) or relies on cadence being slower than runtime (browser-keepalive). No convention.
4. Pause surface inconsistency
Three ways to pause one agent:
- Comment the crontab line with
# PAUSED <date> by Robert(used
for 5 existing entries — requires crontab -e).
- Append label to
state/bin/byline/paused.txt(used for 4 labels
currently: linkedin-post-scraper, scrape-competitor-pricing-weekly, __breaker__, __fixer__).
/snappy-pause <name>which writes `state/agents/<name>.json
status=paused` (different entity class entirely).
Method 1 requires editing cron. Methods 2 and 3 are engaged by different dispatch paths and are NOT transitive — pausing content-mine via paused.txt does nothing for a manual /snappy-run content-mine spawn, which bypasses the file entirely.
5. Single source of truth for "what's scheduled"
There is ONE dashboard (state/bin/agents-dashboard.sh) that scans every surface:
- claude-cron agents (by grepping the log for last-fire)
- autopilot breaker/fixer (by
engaged.json+ last-log) - snappy-os doctor (by convention)
- snappy-browse keepalive (by log)
- Worker (
_statuscurl) - Local state (
defaultagent, content-mine overdue, open frictions,
Ray draft status)
Operator sees it via /snappy-ops. This is the one elegant piece already — glance, see truth. It already caught the missing linkedin-sweep-cron.sh script.
But the dashboard is read-only. To actually change state (pause, resume, run-now) the operator hops across 3 surfaces (crontab, paused.txt, state/agents). No single control plane.
6. Inconsistencies with file/line refs
~/robertboulos/robot-rob/crontabline 33
(0 7,9,11,13,15,17,19 * * * /Users/robertboulos/projects/snappy-os/state/bin/inbox-sweep/linkedin-sweep-cron.sh) → references missing script. Dashboard P1. Neither snappy-os nor kernel skills dir has the file.
state/bin/byline/agents.sh:18— theinbox-sweepagent in the
byline catalog references a DIFFERENT path ($HOME/.claude/skills/snappy-inbox-sweep/linkedin-sweep-cron.sh) which also doesn't exist. One broken reference is a bug; two diverged broken references is the design flaw.
state/skills/snappy-pause.md:26-27— the skill targets
state/agents/<name>.json, but the operator's mental model (informed by byline) is the 12-agent cron catalog. Loader sidecar says "pause the default agent" — not "this only applies to multi-turn /snappy-go agents, NOT to cron-scheduled content-mine / polish-digest."
state/bin/agents/tick.sh:41hardcodes lock at
/tmp/snappy-agent-${ID}.lock; claude-cron.sh:15 hardcodes /tmp/claude-cron-$JOB.lock. Different prefixes for the same class of defect (double-dispatch protection).
- Crontab runs 5 PAUSED lines as commented entries. Paused by
comment is invisible to the dashboard unless the tracked list is hand-synced (it is — agents-dashboard.sh hardcodes __breaker__ / __fixer__ pause detection). Dashboard fidelity depends on whoever edits cron remembering to tell the dashboard.
state/bin/browser/keepalive.shis not in the byline agents
catalog even though it IS a scheduled agent. Dashboard hardcodes its detection path. Adding a new cron agent requires 4 edits: crontab, agents.sh, agents-dashboard.sh, optionally settings.tsv.
7. Can Robert pause/resume/start content-mine from the byline without editing cron?
Pause: yes, via byline ⌥⌘P → ctl.sh pause content-mine → appends to paused.txt. But claude-cron.sh does NOT read paused.txt — it is a per-tick file and cron fires unconditionally. So the appended label has zero effect on the cron line; it only stops the TUI-initiated re-runs. The crontab entry still fires every hour. This is a broken abstraction — the pause surface looks like it paused the agent but the agent keeps running.
Verified by: grep paused.txt in claude-cron.sh → 0 matches. claude-cron.sh has no engagement/paused check. Confirmed the line 0 15 * * * claude-cron.sh content-mine is currently commented # PAUSED 2026-04-18 — by Robert manually editing crontab, not via byline.
Resume: same story, inverse.
Run: yes, via /snappy-run content-mine or byline ⌥⌘R. Works.
8. Elegance score: 2/5
Justification:
- +1 the ops dashboard is a single-pane-of-glass for "is it alive"
- +1 skills (
/snappy-run,/snappy-list) exist and are prose-
documented; the 12-agent catalog IS centralized in agents.sh
- -1 two parallel agent systems (cron catalog vs multi-turn JSON)
share verb names but not mechanisms
- -1 three separate pause surfaces that don't interoperate
- -1 lockfile convention is 3-way inconsistent, 4 paths have no
lock at all
- -1 dead cron entry referencing missing script (P1 in dashboard
but not auto-healed)
Net: 2/5 — mixed. The nice parts (dashboard, catalog file) are undermined by the non-composing control verbs and the silent pause abstraction.
9. Proposed consolidating move (ONE diagram, ONE paragraph)
┌───────────────────────────────┐
│ state/bin/byline/agents.sh │ ← ONE catalog, adds `paused: 0/1` column
│ (12 agents, +keepalive) │ and `lock: /tmp/snappy-<label>.lock`
└─────────┬─────────────────────┘
│ sourced by
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
agents-dashboard tick.sh (cron) byline/ctl.sh
(reads column) reads `paused` flips `paused`
skips row column in agents.sh
(or a sibling .tsv)
The move: promote state/bin/byline/agents.sh from a byline-only catalog to THE schedule catalog. Add two columns: paused (0/1) and lock_path (canonical /tmp/snappy-<label>.lock). Replace every native crontab line for a Claude job with ONE cron entry — a state/bin/dispatch/tick.sh that iterates the catalog, checks paused, checks cadence vs last-fire in state/log/notify.ndjson, and dispatches via the row's run_cmd. Autopilot break/fix/regen become regular rows in the catalog. /snappy-pause flips the column, which every path respects; claude-cron.sh / the sub-scripts stay unchanged as the actual executors. Net: one cron line instead of 10, one pause surface instead of three, one lock convention instead of four, dead entries self-heal because the catalog IS the source (crontab can't reference a script the catalog doesn't know about).
Do NOT implement in this audit — Pod 34 scope is audit only. The move is named; the implementation is a follow-on commit.