Schedule architecture audit — 2026-04-18

Robert's ask: "the agents being scheduled and the cron and all of that and overall the architecture too needs to be elegant and simple." This page is the end-to-end inventory, the grading, and a single consolidating move.

1. Inventory of dispatch paths

Eight distinct paths reach claude -p (or its equivalent) on this machine. Column order: path → purpose → trigger → lock → log.

#PathPurposeTriggerLockLog
1~/robot-rob/claude-cron.sh <job>One-shot Claude headless by skill name. Clocks in/out to Xano, routes notify.Native crontab (9 live entries: memory-consolidation, skool-daily-post, linkedin-post-scraper, polish-digest, meeting-followup, linkedin-likes-only, ray-update, scrape-competitor-pricing-weekly, plus ad-hoc via byline/ctl.sh run)/tmp/claude-cron-<job>.lock~/robot-rob/logs/claude-cron.log + ~/robot-rob/logs/claude-cron-<job>.<ts>.log
2state/bin/agents/tick.shLong-running multi-turn agents. Reads state/agents/*.json, fires one tick per status=running agent per 10 min (cap 3).Native crontab */10 * * * */tmp/snappy-agent-<id>.lock~/robot-rob/logs/claude-cron.log + ~/robot-rob/logs/claude-cron-agent-<id>.<ts>.log
3state/bin/autopilot/break.shBreaker pod. Files findings to breakage-report.ndjson.Native crontab (PAUSED as of 2026-04-18). Also manual via byline run.none~/.claude/logs/snappy-os-breaker.log
4state/bin/autopilot/fix.shFixer pod. Reads breakage report, fixes one row, commits.Native crontab (PAUSED). Also manual via byline run.none~/.claude/logs/snappy-os-fixer.log
5state/bin/autopilot/regen.shPID-loop re-test gate after skill rewrite.Called by auto-regen.sh stop-hook (not cron).nonestdout
6state/bin/meeting-triage/watch.sh60-sec commitment surfacing after Krisp hangup. run.ts is idempotent; wrapper exits after one run.Native crontab */2 * * * *none (idempotency via followups.ndjson dedupe)~/robot-rob/logs/meeting-triage.log
7state/bin/browser/keepalive.shAgent-browser session refresh.Native crontab 0 */12 * * *none~/.claude/logs/browser-keepalive.log
8state/bin/ray-update/friday.shFriday auto-draft for Ray.Native crontab 0 18 * * 5none~/robot-rob/logs/ray-update-friday.log

Plus three non-Claude utility crons (lint eval-coverage every 30m, snappy-os doctor every 6h, ray-sheet Freshbooks refresh daily) and one MISSING script:

#PathStatus
state/bin/inbox-sweep/linkedin-sweep-cron.sh (crontab line 7,9,11,13,15,17,19)DEAD. Neither state/bin/inbox-sweep/ nor ~/.claude/skills/snappy-inbox-sweep/linkedin-sweep-cron.sh exists. Dashboard flags it as SCRIPT MISSING.
snappy-os doctor --silent (crontab 30 */6)CLI snappy-os is installed; fine.

Worker-side scheduled handlers (cf-worker lives in separate repo ~/projects/snappy-skills/, verified via worker-architecture.md):

* * * * *   → runQuorumPromotion(env)   (never promoted — needs ≥3 tenants)
*/5 * * * * → rebuildCatalog(env); syncSkoolThreads(env)

Status probe at https://skills.snappy.ai/_status returns worker_version: 0.4.2, tenants_total: 0, do_spaces_reachable: true, last_quorum_promotion: null.

2. The two parallel "agent" systems

The single biggest elegance debt on this machine: there are two conceptually different "agents" with parallel pause/resume/list surfaces.

System A — byline cron catalog (state/bin/byline/agents.sh): 12 agents defined as pipe-delimited rows (abbr, emoji, pretty name, cron label, run command, cadence, optional live log). Pause semantics: append label to state/bin/byline/paused.txt; the cron reads the file and skips. Controls: /snappy-run, /snappy-pause (via byline ctl.sh — different codepath than the skill name implies).

System B — long-running multi-turn agents (state/agents/*.json + state/bin/agents/ctl.ts): go/pause/resume/stop/list verbs write JSON state files; tick.sh reads them each 10 min. Pause semantics: set status=paused in the JSON file.

Skills /snappy-pause, /snappy-resume, /snappy-list are wired exclusively to System B. /snappy-run is wired to System A. Operator-facing pages do NOT explain this split, so the skill loader for snappy-pause says "pauses the default agent" while the operator probably means "pause the content-mine cron." That mismatch is a footgun.

3. Lockfile convention

Three patterns in use across one repo:

PatternUsed by
/tmp/claude-cron-<label>.lockclaude-cron.sh (9 agents)
/tmp/snappy-agent-<id>.lockagents/tick.sh (multi-turn)
/tmp/snappy-run-<label>-<ts>.log (log, not lock)/snappy-run spawn
(no lock)autopilot break/fix, meeting-triage, browser keepalive, ray-update-friday

Four dispatch paths have no lockfile at all. Collision protection is either documented-idempotent (meeting-triage) or relies on cadence being slower than runtime (browser-keepalive). No convention.

4. Pause surface inconsistency

Three ways to pause one agent:

  1. Comment the crontab line with # PAUSED <date> by Robert (used

for 5 existing entries — requires crontab -e).

  1. Append label to state/bin/byline/paused.txt (used for 4 labels

currently: linkedin-post-scraper, scrape-competitor-pricing-weekly, __breaker__, __fixer__).

  1. /snappy-pause <name> which writes `state/agents/<name>.json

status=paused` (different entity class entirely).

Method 1 requires editing cron. Methods 2 and 3 are engaged by different dispatch paths and are NOT transitive — pausing content-mine via paused.txt does nothing for a manual /snappy-run content-mine spawn, which bypasses the file entirely.

5. Single source of truth for "what's scheduled"

There is ONE dashboard (state/bin/agents-dashboard.sh) that scans every surface:

Ray draft status)

Operator sees it via /snappy-ops. This is the one elegant piece already — glance, see truth. It already caught the missing linkedin-sweep-cron.sh script.

But the dashboard is read-only. To actually change state (pause, resume, run-now) the operator hops across 3 surfaces (crontab, paused.txt, state/agents). No single control plane.

6. Inconsistencies with file/line refs

  1. ~/robertboulos/robot-rob/crontab line 33

(0 7,9,11,13,15,17,19 * * * /Users/robertboulos/projects/snappy-os/state/bin/inbox-sweep/linkedin-sweep-cron.sh) → references missing script. Dashboard P1. Neither snappy-os nor kernel skills dir has the file.

  1. state/bin/byline/agents.sh:18 — the inbox-sweep agent in the

byline catalog references a DIFFERENT path ($HOME/.claude/skills/snappy-inbox-sweep/linkedin-sweep-cron.sh) which also doesn't exist. One broken reference is a bug; two diverged broken references is the design flaw.

  1. state/skills/snappy-pause.md:26-27 — the skill targets

state/agents/<name>.json, but the operator's mental model (informed by byline) is the 12-agent cron catalog. Loader sidecar says "pause the default agent" — not "this only applies to multi-turn /snappy-go agents, NOT to cron-scheduled content-mine / polish-digest."

  1. state/bin/agents/tick.sh:41 hardcodes lock at

/tmp/snappy-agent-${ID}.lock; claude-cron.sh:15 hardcodes /tmp/claude-cron-$JOB.lock. Different prefixes for the same class of defect (double-dispatch protection).

  1. Crontab runs 5 PAUSED lines as commented entries. Paused by

comment is invisible to the dashboard unless the tracked list is hand-synced (it is — agents-dashboard.sh hardcodes __breaker__ / __fixer__ pause detection). Dashboard fidelity depends on whoever edits cron remembering to tell the dashboard.

  1. state/bin/browser/keepalive.sh is not in the byline agents

catalog even though it IS a scheduled agent. Dashboard hardcodes its detection path. Adding a new cron agent requires 4 edits: crontab, agents.sh, agents-dashboard.sh, optionally settings.tsv.

7. Can Robert pause/resume/start content-mine from the byline without editing cron?

Pause: yes, via byline ⌥⌘Pctl.sh pause content-mine → appends to paused.txt. But claude-cron.sh does NOT read paused.txt — it is a per-tick file and cron fires unconditionally. So the appended label has zero effect on the cron line; it only stops the TUI-initiated re-runs. The crontab entry still fires every hour. This is a broken abstraction — the pause surface looks like it paused the agent but the agent keeps running.

Verified by: grep paused.txt in claude-cron.sh → 0 matches. claude-cron.sh has no engagement/paused check. Confirmed the line 0 15 * * * claude-cron.sh content-mine is currently commented # PAUSED 2026-04-18 — by Robert manually editing crontab, not via byline.

Resume: same story, inverse.

Run: yes, via /snappy-run content-mine or byline ⌥⌘R. Works.

8. Elegance score: 2/5

Justification:

documented; the 12-agent catalog IS centralized in agents.sh

share verb names but not mechanisms

lock at all

but not auto-healed)

Net: 2/5 — mixed. The nice parts (dashboard, catalog file) are undermined by the non-composing control verbs and the silent pause abstraction.

9. Proposed consolidating move (ONE diagram, ONE paragraph)

                ┌───────────────────────────────┐
                │  state/bin/byline/agents.sh   │  ← ONE catalog, adds `paused: 0/1` column
                │  (12 agents, +keepalive)      │     and `lock: /tmp/snappy-<label>.lock`
                └─────────┬─────────────────────┘
                          │ sourced by
        ┌─────────────────┼─────────────────┐
        │                 │                 │
        ▼                 ▼                 ▼
  agents-dashboard   tick.sh (cron)    byline/ctl.sh
   (reads column)   reads `paused`     flips `paused`
                    skips row           column in agents.sh
                                        (or a sibling .tsv)

The move: promote state/bin/byline/agents.sh from a byline-only catalog to THE schedule catalog. Add two columns: paused (0/1) and lock_path (canonical /tmp/snappy-<label>.lock). Replace every native crontab line for a Claude job with ONE cron entry — a state/bin/dispatch/tick.sh that iterates the catalog, checks paused, checks cadence vs last-fire in state/log/notify.ndjson, and dispatches via the row's run_cmd. Autopilot break/fix/regen become regular rows in the catalog. /snappy-pause flips the column, which every path respects; claude-cron.sh / the sub-scripts stay unchanged as the actual executors. Net: one cron line instead of 10, one pause surface instead of three, one lock convention instead of four, dead entries self-heal because the catalog IS the source (crontab can't reference a script the catalog doesn't know about).

Do NOT implement in this audit — Pod 34 scope is audit only. The move is named; the implementation is a follow-on commit.