Skip to content

feat(docker): scheduled reminders via supercronic#6

Merged
jkyberneees merged 3 commits into
mainfrom
feat/docker-supercronic-reminders
Jun 3, 2026
Merged

feat(docker): scheduled reminders via supercronic#6
jkyberneees merged 3 commits into
mainfrom
feat/docker-supercronic-reminders

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Summary

  • Adds in-container cron scheduling using supercronic (pinned to v0.2.46, SHA-256 verified) so odek run --deliver reminders can fire on a schedule without a host crontab
  • ODEK_TELEGRAM_DEFAULT_CHAT_ID env var support added (was config-file-only) so the delivery target lives in .env
  • All 6 post-merge code review findings addressed (zombie reaping, SIGTERM forwarding, graceful-restart bypass, wrong-arch fallback, silent startup failure, directory-mount confusion)

Why supercronic over crond

Classic crond scrubs the environment — ODEK_API_KEY and the bot token never reach a cron tick. supercronic runs as a non-root user and passes its own environment to every job.

Changes

File What
docker/Dockerfile Install supercronic (SHA-256 pinned, arch-aware via TARGETARCH:?); cron-entrypoint.sh wrapper as new ENTRYPOINT
docker/cron-entrypoint.sh Starts supercronic if crontab mounted; exports ODEK_ENTRYPOINT + ODEK_SUPERCRONIC_PID for restart safety; liveness check + directory-mount warning
docker/docker-compose.yml Mount ./crontab into telegram services; init: true for zombie reaping and SIGTERM forwarding
docker/crontab Example reminders file (all commented out)
docker/.env.example + README.md Document the new env var and cron workflow
internal/telegram/config.go Read ODEK_TELEGRAM_DEFAULT_CHAT_ID from env
cmd/odek/telegram.go spawnChild uses ODEK_ENTRYPOINT when set — graceful /restart re-launches through the wrapper so supercronic is restarted
*_test.go 4 tests for ConfigFromEnv (DefaultChatID); 3 tests for spawnChild (ODEK_ENTRYPOINT branch, empty fallback, env isolation); spawnChild coverage 68% → 89%

Test plan

  • make test passes (unit + telegram package)
  • Docker build succeeds with --build-arg TARGETARCH=arm64
  • Empty TARGETARCH produces a hard build error (not a silent wrong-arch download)
  • Container with mounted crontab: supercronic fires job and env vars are visible
  • /restart in Telegram chat: supercronic restarts (no duplicate instances)
  • docker stop: SIGTERM reaches supercronic (init: true forwards it)

🤖 Generated with Claude Code

jkyberneees and others added 3 commits June 3, 2026 16:35
Add in-container scheduling so the dockerized agent can fire reminders on a
cron schedule and deliver results to Telegram — without a host crontab.

Why supercronic over crond: classic crond scrubs the environment from its
jobs (so env_file vars like ODEK_API_KEY / the bot token never reach a tick)
and wants root to setuid, clashing with the non-root container user.
supercronic runs as the normal user and passes its own environment to each
job, so a scheduled `odek run --deliver` sees exactly what the bot sees.

- Dockerfile: install supercronic v0.2.46, pinned by SHA-256 computed from the
  official release assets (arch-aware via TARGETARCH); add a cron-entrypoint.sh
  wrapper that starts supercronic only when a crontab is mounted, then execs
  odek (PID-1 semantics, signals, and the Telegram lock unchanged).
- compose: mount ./crontab into both telegram profiles.
- telegram config: read ODEK_TELEGRAM_DEFAULT_CHAT_ID from env (was config-file
  only), so --deliver's target chat can live in .env like everything else.
- docs + .env.example: document reminders and the new env var.

E2E verified (build + run): supercronic SHA-256 checks out, a cron job inside
the container inherits an injected env var, and the no-crontab path still runs
odek unchanged. Unit tests cover the new env-var parsing (incl. negative IDs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five findings from a post-merge review, all verified:

C1 — graceful restart (/restart) re-execs the bare odek binary, bypassing
cron-entrypoint.sh; supercronic was never restarted.
Fix: cron-entrypoint.sh exports ODEK_ENTRYPOINT=$0 and ODEK_SUPERCRONIC_PID.
spawnChild() uses ODEK_ENTRYPOINT when set so the wrapper is re-entered on
restart. The wrapper kills the old supercronic PID before starting a new one,
preventing duplicate scheduler instances.

C2+C3 — no init process: supercronic zombies on exit; SIGTERM from docker stop
not forwarded to supercronic (in-flight jobs killed abruptly at SIGKILL).
Fix: init: true on both telegram compose services. Docker's built-in init
becomes PID 1, reaping orphaned children and forwarding SIGTERM to the process
group.

C4 — arch="${TARGETARCH:-amd64}" silently installed the wrong-arch supercronic
binary on arm64 hosts building without BuildKit; the SHA check still passed.
Fix: change to ${TARGETARCH:?...} — a hard build failure with an actionable
error message rather than a silent wrong-arch download.

C5 — Docker creates a root-owned directory at ./crontab on the host when the
source path is missing from a bind mount; [ -f ] returned false silently.
Fix: add [ -d ] branch with an explicit warning explaining the cause and fix.

C6 — supercronic backgrounded with &; set -e does not apply to background
processes, so an immediate startup failure was silently swallowed.
Fix: sleep 1 + kill -0 liveness check after launch; emits a clear WARNING if
supercronic exits immediately, rather than proceeding as if cron is running.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The graceful-restart fix in the previous commit added an ODEK_ENTRYPOINT
check to spawnChild — when set by cron-entrypoint.sh, the child is re-exeucted
through the wrapper so supercronic is restarted. That branch was not covered.

Add three targeted tests:
- TestSpawnChild_UsesODEKENTRYPOINT: exercises the true branch (ODEK_ENTRYPOINT
  set) — spawnChild must call os.StartProcess with the wrapper path, not the
  odek binary. Uses /bin/sh as a universally present stand-in executable.
- TestSpawnChild_ODEKENTRYPOINTEmpty_FallsBackToOdekBinary: empty env var must
  not override the executable (false branch).
- TestSpawnChild_ResolvedAPIKeyInjected: API key is appended to childEnv only,
  not leaked into the current process environment.

spawnChild coverage: 68.4% → 89.5%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit a60e658 into main Jun 3, 2026
6 checks passed
jkyberneees added a commit that referenced this pull request Jun 5, 2026
#7)

* feat(schedule): native in-process scheduler core (phase 1)

Introduce internal/schedule — the engine for odek's native cron capability,
replacing the Docker + supercronic approach. Running in-process means the host
already has resolved config (API key, model, bot token, default chat) in
memory, so a scheduled task sees exactly what an interactive one does — no
environment-inheritance games, no external cron daemon, no container-only
behaviour.

This phase is the standalone core only — no CLI or bot wiring yet.

- types.go: Job / Delivery / RunState. Definitions and runtime state persist
  to separate files so a hand-edit never races a state write.
- cronexpr.go: stdlib-only 5-field cron parser (ranges, steps, lists, names,
  @macros) with correct Vixie dom/dow union semantics, timezone-aware Next()
  via coarse unit-stepping, and a horizon that clears the leap-century gap.
- store.go: atomic (temp+rename, 0600) CRUD for schedules.json and
  schedule-state.json, mirroring session.Store; validates jobs on write.
- scheduler.go: firing engine decoupled from the agent/telegram via Runner and
  Deliverer interfaces. Earliest-fire timer (no per-minute polling), bounded
  concurrency, per-job overlap guard, missed-run skip/catchup policy, mtime
  hot-reload, and graceful drain on context cancellation.

Tests: 39 cases, 87.9% coverage, green under -race. Parser table tests
(ranges/steps/lists/names/macros/dom-dow union/leap day/timezone/errors);
engine tests drive reconcile/fireDue directly with explicit clocks plus one
real-clock lifecycle test — deterministic, no flaky sleeps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(schedule): odek schedule CLI + headless runner/deliverers (phase 2)

Wire the scheduler core into the CLI and give it a way to actually run tasks.

- cmd/odek/schedule.go: `odek schedule <list|add|rm|enable|disable|run|next|daemon>`.
  * add: flag-parsed (--name/--cron/--deliver/--tz/--catchup/--disabled) with
    a trailing task; validates and shows the next fire.
  * list: tabular view with computed next-fire (local time) and last status.
  * next: previews upcoming fires for a job ID or a raw expression.
  * run: fires one job immediately and delivers (test a job).
  * daemon: foreground scheduler with a singleton pid lock (refuses a second
    instance rather than usurping a live one) and graceful SIGINT/SIGTERM drain.
- runTaskHeadless: builds a fresh agent with a silent (io.Discard) renderer,
  interaction off, and no approver — the resolved danger policy governs what an
  unattended task may do, mirroring non-interactive `odek run`.
- agentRunner / cliDeliverer implement the schedule.Runner / schedule.Deliverer
  interfaces; delivery routes to stdout, ~/.odek/schedule.log, or Telegram
  (honouring a per-job chat ID, falling back to default_chat_id).
- dispatch + printUsage wired for the new command.

Tests cover parseDeliver, deliverString, firstWords, jobSchedule, and the
deliverer branches (log append, telegram misconfig errors, unknown kind).
Smoke-tested end to end: add/list/next/enable/disable/rm, schedules.json at
0600, and daemon start → second-instance refused → clean SIGINT drain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(schedule): run the scheduler inside odek telegram (phase 3)

The Telegram bot now hosts the scheduler in-process, so reminders and the bot
share one runtime — the whole reason to go native. No separate cron daemon, no
environment-inheritance problem.

- startSchedulerForBot: launched after the poller, stopped on ctx cancel. It
  acquires the shared schedule pid-lock; if an external `odek schedule daemon`
  already holds it, the bot defers (logs and skips) rather than double-firing.
- telegramRunner: runs each job headless and accounts token usage against the
  bot's daily budget — pre-flight refuse when exhausted, bill the run after.
- telegramDeliverer: delivers via the LIVE bot for telegram jobs (sharing its
  client and 429 backoff) and falls back to the CLI deliverer for stdout/log.
- runTaskHeadless now captures cumulative tokens via an IterationCallback, so
  the Runner's token count is real (engine logs it; bot bills it).
- Graceful restart releases the schedule lock before os.Exit, mirroring the
  Telegram instance lock, so the restarted child's scheduler re-acquires cleanly.

Tests: embedded deliverer routing — live-bot send, default-chat fallback,
no-chat error, and stdout/log fallback — via the recording test bot. Full
cmd/odek suite green under -race; whole module suite green, vet + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(schedule): schedules config section + docs (phase 4)

Make the scheduler configurable and documented.

- internal/config: new `schedules` section (enabled, max_concurrent, timezone,
  catchup) with the same file→env→default layering as every other section.
  resolveSchedules + ODEK_SCHEDULES_* env overrides + overlayFile handling.
  Defaults: enabled=true, max_concurrent=2, timezone=UTC, catchup=false.
- cmd/odek: the daemon and the embedded (bot) scheduler now build their engine
  Options from resolved.Schedules via a shared schedulerOptions helper
  (max-concurrent, default timezone, catchup). The embedded scheduler is gated
  on schedules.enabled so it can be turned off in favour of a standalone daemon.
- docs: new docs/SCHEDULES.md (canonical guide — runtime models, CLI, cron
  syntax incl. Vixie dom/dow coupling, delivery, the unattended-safety policy,
  config, missed-run behaviour); a Schedules section in CONFIG.md; a feature
  bullet in README.

Tests: resolveSchedules defaults/overrides/partial, and LoadConfig wiring for
defaults and ODEK_SCHEDULES_* env. Full config + schedule + cmd suites green,
vet + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(docker): retire supercronic, use the native scheduler (phase 5)

The bot now hosts the in-process scheduler (phase 3), so the container needs no
external cron at all. Remove the supercronic scaffolding entirely.

- Dockerfile: drop the supercronic download (and its ARG TARGETARCH/SHA pin),
  the ~/.crontabs dir, and the cron-entrypoint.sh wrapper. ENTRYPOINT is back to
  ["odek"]. The image no longer needs --build-arg TARGETARCH.
- docker-compose: remove the ./crontab bind mounts from both telegram services.
  Keep init: true (now justified generally — reap agent-spawned children and
  forward SIGTERM), with an honest comment.
- Delete docker/cron-entrypoint.sh and docker/crontab.
- spawnChild: remove the now-dead ODEK_ENTRYPOINT re-exec branch (it only
  existed to restart supercronic via the wrapper). A restarted `odek telegram`
  starts its own embedded scheduler goroutine; gracefulRestart still releases
  the schedule lock so the child re-acquires cleanly. Drop the two obsolete
  ODEK_ENTRYPOINT tests.
- docs: docker/README + .env.example now describe the native scheduler
  (`odek schedule`, jobs in ./.odek/schedules.json); TELEGRAM.md points to
  SCHEDULES.md from its OS-cron section.

Validated: image builds without TARGETARCH, supercronic absent from the image,
ENTRYPOINT runs odek, and `odek schedule next` works inside the container.
Compose config valid; full module suite green, vet + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(schedule): address code-review findings

Ten findings from the high-effort review of the native scheduler:

#1 (security) Unattended tasks could silently run dangerous ops: a nil approver
with no TTY falls back to NonInteractiveAction(), which defaults to ALLOW. Set a
"deny" floor in runTaskHeadless when the policy doesn't explicitly choose one
(mirrors sub-agent hardening); explicit allow/deny (godmode/restricted) honoured.

#2 (correctness) cron parseField flagged a dom/dow field as a wildcard whenever
it merely started with "*", so a list like "*/2,15" broke the Vixie union rule
(AND instead of OR). Now star is set only when EVERY comma item is wildcard-based.

#3 (correctness) The Run loop did a blocking `sem <- {}` in fireDue, so
MaxConcurrent hung jobs wedged shutdown/reload. Now the sem acquire selects on
ctx (clearing the overlap guard for undispatched jobs), and each run is bounded
by Options.RunTimeout (default 15m).

#4 (correctness) Budget pre-check used CheckDailyBudget(1), which persists +1 per
fire. Switched to read-only DailyTokenUsage() for the gate; actual cost still
billed after the run.

#5 (robustness) acquireScheduleLock now does a /proc/<pid>/cmdline identity check
so a recycled PID can't make the scheduler refuse to start forever; pid file
tightened to 0600.

#6 (correctness) Missed-run detection trusted a persisted NextRun even after the
cron changed while down. RunState now records the schedule signature; reconcile
ignores NextRun when the sig differs (no spurious catchup/skip).

#7 (efficiency) MCP servers were reconnected per fire. They're now connected once
at daemon/bot startup and shared across fires (the MCP client is mutex-safe);
builtin tools stay fresh per fire.

#8 (efficiency) reconcile re-parsed cron + LoadLocation for unchanged jobs every
reload. The sig short-circuit now runs before compile().

#9 (cleanup) Hoisted the repeated `cfg.Schedules == nil` guard in loader.go.

#10 (cleanup) Daemon reuses telegram.NewFileLogger instead of a hand-rolled
stderrLogger (deleted).

Tests: cron union for step-lists + plain-step-still-wildcard; cron-changed-while-
down (no false catchup); fireDue unblocks on ctx cancel with a full semaphore.
Full suite green under -race, vet + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(schedule): address ultrareview findings

Nine findings from the cloud multi-agent review.

bug_005 (normal) Impossible cron expressions (e.g. "0 0 30 2 *", Feb 30) passed
Validate but Next() returns the zero time, which the engine treated as
perpetually due → fired every tick forever, burning tokens. Now Validate rejects
them at add time, and reconcile/fireDue defensively skip a zero next-fire (for
hand-edited files).

bug_009 (normal) The embedded scheduler's stop closure tore down shared MCP
connections and the lock without waiting for in-flight jobs to drain, causing
broken-pipe errors persisted as bogus failure state. The closure now waits on a
done channel (20s bound) before cleanup.

bug_013 (normal) gracefulRestart calls os.Exit(0), which skips deferred
stopScheduler → mcpCleanup never ran → MCP child processes (Playwright/Chromium)
leaked on every /restart. Added mcpCleanupRef, invoked before os.Exit like
scheduleUnlockRef.

bug_007 (normal) reconcile reseeded s.runs from disk in the unchanged branch,
clobbering an in-flight fire's increment → lost Runs counts. It now skips the
reseed for unchanged/running jobs. Also moved the missed-fire SaveState out of
the s.mu critical section and stopped swallowing its error.

bug_004 (normal) runTaskHeadless used RunWithMessages with a bare system
message, so RuntimeContext (host/cwd/date) never reached the LLM — date-aware
jobs ("summarize today's calendar") had no notion of "today". Switched to
agent.Run, which prepends the engine's runtime-context-inclusive system message.

bug_014 (nit) Deliverer.Deliver took no context, so a stuck Telegram send blocked
the drain. Added ctx to the interface + bot.SendMessageContext; the scheduler
passes the run ctx through.

bug_015 (nit) Concurrent CLI mutations could lose writes (read-modify-write with
only an in-process mutex). Added an flock on ~/.odek/schedules.lock around the
store's write methods.

bug_006 (nit) scheduleNext swallowed store errors → misleading "bad cron" on a
corrupt store. It now returns the store error.

bug_002 (nit) docker/README + SCHEDULES.md misdescribed the lock as symmetric;
reworded to note the bot defers silently while the daemon refuses to start.

Tests: impossible-cron rejected (Validate) + skipped (reconcile); unchanged
reconcile preserves in-memory Runs. Full suite green under -race, vet + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(schedule): cover error paths and edge cases to 99.6%

Add targeted tests for the native scheduler package and its CLI glue,
raising internal/schedule statement coverage from 87.8% to 99.6% (the
only remaining gap is the best-effort flock syscall-error fallback).

internal/schedule/coverage_test.go exercises:
- store error paths: NewStore HOME failure, NewStoreAt mkdir failure,
  corrupt-file loadDoc/loadState propagation across all CRUD methods,
  writeJSONAtomic marshal/write/rename failures, version defaulting,
  null states map, fileLock open failure, and the List ID tiebreak.
- scheduler branches: reload-on-mtime-change, reconcile List/LoadState
  errors, skip- and execute-time SaveState failures, zero next-fire
  drop, timeToNext empty/past/near cases, compile bad-timezone, and
  preview truncation.
- cronexpr branches: nil-location default, empty field, range/empty
  value parse errors, and a month mismatch in Matches.

cmd/odek/schedule_cli_test.go covers the non-LLM CLI surface: list,
add, rm, enable/disable, next, command dispatch, scheduler options,
MCP no-op, schedule lock acquire/release, embedded-scheduler lifecycle,
and the telegram budget gate.

* docs: make docs consistent with the native scheduler

The native scheduler landed with its own docs (SCHEDULES.md, CONFIG.md,
TELEGRAM.md, docker/*) but left a few cross-references stale or missing.
Bring the rest of the docs in line:

- README.md: add the missing Scheduled Tasks row to the docs index
  (the feature section already linked SCHEDULES.md).
- docs/index.html: add a Scheduled Tasks feature card mirroring README.
- docs/CLI.md: list 'odek schedule' (and the previously-omitted
  'odek telegram') in the command table; point '--deliver' at the
  native scheduler for recurring tasks.
- docs/CHEATSHEET.md: add a schedule quick-reference (and telegram).
- docs/DAILY-WORKER.md: correct the comparison table — odek now has
  native, in-process scheduling rather than 'None'.

* feat(schedule): manage schedules from Telegram

Add /schedules and /schedule slash commands so an authorized Telegram
user can list, view, preview, add, enable/disable, remove, and test-run
scheduled tasks without leaving the chat — closing the gap where the
native scheduler was CLI/file-only.

Command layer (cmd/odek/schedule_telegram.go):
- /schedules lists jobs; /schedule <sub> dispatches add|view|next|run|
  enable|disable|rm|help.
- add uses cron's fixed arity (an @macro or 5 fields) so no quoting is
  needed; options follow a literal '|' (deliver=, tz=, name=, catchup,
  disabled). Telegram delivery defaults to the originating chat.
- run returns the job's task for the bot to dispatch through the normal
  agent pipeline (progress + approvals visible), test-running it in chat.
- Replies use the existing MarkdownV2 pipeline; cron/IDs are wrapped in
  code spans to stay literal.

Wiring:
- Scheduler gains Reload() (buffered, coalescing) and a select case so
  in-chat edits reconcile immediately instead of waiting for the mtime
  poll; startSchedulerForBot now takes the shared store and publishes
  its Reload via scheduleReloadRef.
- telegram.go creates one schedule.Store, shares it with the embedded
  scheduler, and intercepts the two commands in OnCommand.

Safety/config:
- New schedules.allow_telegram_management (default true, env
  ODEK_SCHEDULES_ALLOW_TELEGRAM_MANAGEMENT) gates the mutating verbs;
  read-only listing/preview always works. Access is already bounded by
  the bot's allowed_chats/allowed_users.

Docs: SCHEDULES.md gains a 'Managing from Telegram' section; TELEGRAM.md,
CONFIG.md, docker/README.md and .env.example updated. Tests cover the
parser, every subcommand, the management gate, and the Reload trigger.

* fix(schedule): format scheduled results for Telegram MarkdownV2

Scheduled task results were sent raw, so odek markdown like **bold**
arrived as literal asterisks. Route both the CLI and embedded-bot
deliverers through sendTelegramResult, which mirrors the live bot's
SendResponse pipeline: convert to MarkdownV2, chunk via FormatResponse,
and retry each chunk as plain text if Telegram rejects the formatting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees deleted the feat/docker-supercronic-reminders branch June 6, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant