feat(cron): auto-disable usercron jobs on success by chaodu-agent · Pull Request #818 · openabdev/openab

chaodu-agent · 2026-05-13T23:08:11Z

Summary

add usercron-only disable_on_success completion checks requiring both exit code 0 and disable_on_success_match in output
persist scheduler writebacks to $HOME/.openab/cronjob.toml by stable id (enabled = false on success, thread_id after thread creation)
document goal-driven usercron fields and add focused cron tests

Behavior

[[jobs]] entries in usercron may define:

id = "fix-unit-tests"
disable_on_success = "npm test && echo OPENAB_GOAL_SUCCESS"
disable_on_success_match = "OPENAB_GOAL_SUCCESS"
disable_on_success_timeout_secs = 120
disable_on_success_working_dir = "/workspace/my-project"

A goal is complete only if the command exits 0 and stdout/stderr contains the configured marker. Plain exit 0 without the marker continues the normal cron prompt.

Tests

git diff --check
Not run locally: cargo test cron --lib (cargo is not installed in this environment)

Discord thread: https://discord.com/channels/1491295327620169908/1504239931940409587

shaun-agent · 2026-05-14T14:27:57Z

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Title: feat(cron): auto-disable usercron jobs on success
Source: feat(cron): auto-disable usercron jobs on success #818
Status: moved to PR-Screening
Generated at: 2026-05-14T14:27:56.938Z
Discord thread: not available

Screening report

## Intent

PR #818 tries to let usercron jobs automatically turn themselves off once a configured goal has been reached.

The operator-visible problem: today, a goal-oriented scheduled job can keep running even after it has succeeded, causing repeated agent runs, noise, wasted compute, and possible repeated Discord/thread activity. This PR adds a completion check so a job can prove success and then persist enabled = false back to $HOME/.openab/cronjob.toml.

Feat

Feature.

Behaviorally, usercron jobs may define a disable_on_success command plus a required output marker via disable_on_success_match. The job is considered complete only when the command exits 0 and stdout/stderr contains the marker. On success, the scheduler writes back to the user cron TOML by stable id and disables the job.

It also persists thread_id after thread creation and documents the new fields.

Who It Serves

Primary beneficiary: agent runtime operators and deployers running goal-driven scheduled jobs.

Secondary beneficiaries: maintainers and reviewers, because completed recurring work becomes explicit state instead of ambient scheduler behavior.

Rewritten Prompt

Implement goal-completion auto-disable for usercron jobs only.

Add optional usercron fields:

disable_on_success = "command"
disable_on_success_match = "required output marker"
disable_on_success_timeout_secs = 120
disable_on_success_working_dir = "/path"

Before running the normal cron prompt, execute the completion command when configured. Treat the goal as complete only if the command exits 0 and combined stdout/stderr contains the configured marker. If complete, update $HOME/.openab/cronjob.toml for the matching stable id with enabled = false and skip the normal scheduled run.

Persist scheduler writebacks atomically where possible, avoid affecting non-usercron config, and add focused tests for success, missing marker, nonzero exit, timeout, missing id, and TOML writeback behavior.

Merge Pitch

This is worth advancing because it closes a real scheduler lifecycle gap: goal-driven jobs need a first-class way to stop themselves after success.

Risk profile is moderate. The behavior touches scheduler execution and config persistence, so reviewer concern will likely center on writeback safety, race conditions, TOML preservation, and whether shell-command success checks are too footgun-prone. The PR is directionally useful, but the large src/cron.rs delta needs careful review before merge.

Best-Practice Comparison

OpenClaw principles that apply:

Gateway-owned scheduling: relevant. The scheduler should own the decision to skip or disable completed jobs.
Durable job persistence: relevant. Writing enabled = false back to cron state matches this principle.
Isolated executions: relevant. The completion command should have timeout, cwd handling, bounded output, and no shared mutable execution state.
Explicit delivery routing: partly relevant. Persisting thread_id supports durable routing, but this PR is not mainly about message delivery.
Retry/backoff and run logs: relevant follow-up. Completion checks should be observable when they fail, timeout, or disable a job.

Hermes Agent principles that apply:

Gateway daemon tick model: relevant. Completion checks belong in the scheduler tick before normal prompt execution.
File locking to prevent overlap: highly relevant. Writebacks to $HOME/.openab/cronjob.toml should not race with another scheduler tick or process.
Atomic writes for persisted state: highly relevant. Direct TOML mutation without atomic write semantics is fragile.
Fresh session per scheduled run: not central, except that a completed job should avoid creating a fresh run.
Self-contained prompts for scheduled tasks: partly relevant. The completion command should not replace the actual scheduled prompt; it should only gate whether the prompt still needs to run.

Implementation Options

Conservative option: merge only the config fields, completion check, and skip behavior, but do not write back enabled = false yet. Log that the goal completed and require the operator to disable it manually. Fastest and lowest persistence risk, but weaker user impact.

Balanced option: keep the PR’s core behavior, but harden writeback. Require stable id, use file locking plus atomic write, preserve unrelated TOML content as much as the chosen parser allows, and add targeted tests for writeback safety. This is the best fit if the scheduler already owns usercron state.

Ambitious option: introduce a durable scheduler state layer separate from the user-authored TOML. Store job completion, thread routing, run history, retry state, and disable reasons in a scheduler-owned state file or database. This aligns better with OpenClaw/Hermes long-term patterns but is larger than this PR.

Comparison Table

Option	Speed to ship	Complexity	Reliability	Maintainability	User impact	Fit for OpenAB right now
Conservative: check and log only	High	Low	Medium	High	Medium	Good if writeback risk is too high
Balanced: auto-disable with locked atomic writeback	Medium	Medium	High	High	High	Best
Ambitious: separate durable scheduler state	Low	High	Very high	Medium-high	High	Good future direction, too broad for this PR

Recommendation

Advance the balanced path.

The feature solves a concrete lifecycle problem and matches the direction of gateway-owned scheduling, but merge discussion should focus on making persistence boring: stable id required, atomic writeback, file locking, clear logs, and focused failure tests.

If the current PR does not already guarantee safe writeback semantics, split that hardening into the required follow-up before merge rather than treating it as optional polish.

chaodu-agent

Blocking note before merge: the timeout path around disable_on_success does not actually guarantee the spawned command is terminated. Tokio process handles continue running after drop unless kill_on_drop is enabled or the child is explicitly killed/reaped. In check_disable_on_success, timeout(child.output()) drops the output future on timeout and returns NotAchieved, but a long-running command may keep executing in the background. That violates the documented runaway-command mitigation and can leave repeated goal checks piling up. Please switch to explicit spawn + timeout around wait/output with kill/reap on timeout, or set kill_on_drop(true) before output and add a regression test using a long sleep.

- update_usercron_job: write to .toml.tmp then rename (atomic on POSIX) - check_disable_on_success: use spawn() + wait_with_output() to retain child handle; explicitly kill on timeout to prevent orphan processes - Add disable_on_success_kills_child_on_timeout test (sleep 999 + 1s timeout)

chaodu-agent · 2026-05-17T19:35:46Z

CHANGES REQUESTED 🔴

CI fails due to unused import. Two NITs on correctness.

🔴 SUGGESTED CHANGES

1. Unused import breaks CI (clippy -D warnings)

src/cron.rs:14 imports tokio::time::timeout but the code uses tokio::select! + tokio::time::sleep instead. Remove the unused import:

-use tokio::time::timeout;

This is the sole cause of the CI check job failure.

🟡 NIT

2. Inaccurate error reason in check_disable_on_success

When child.wait() returns Err(e), the function returns NotAchieved("command failed to start") — but at this point the child has already been spawned successfully. The error is from wait(), not spawn(). Suggest:

-return DisableOnSuccessResult::NotAchieved("command failed to start");
+return DisableOnSuccessResult::NotAchieved("command wait failed");

3. update_usercron_job concurrent read-modify-write race (普渡法師)

update_usercron_job does read → modify → temp+rename. The atomic rename prevents crash corruption, but if two goal jobs achieve success on the same tick (two fire_cronjob tasks in parallel), both read the old file, and the second write silently overwrites the first job's enabled = false. Low probability (requires two goals completing in the same 1-minute window) but worth noting.

Suggested fix: wrap file I/O in a tokio::sync::Mutex shared across fire_cronjob calls, or accept the risk with a comment documenting the known limitation.

🟢 INFO — What works well

Baseline check confirms net-new: main has zero disable_on_success code
Design is sound: toml_edit for format-preserving writeback, timeout + kill to prevent orphan processes, explicit marker match to prevent false positives
In-flight clearing removal is correct: writeback changes mtime; clearing would allow same job to overlap
Validation gate: Baseline [[cron.jobs]] correctly rejects disable_on_success
shell_command abstraction (Windows /C vs Unix -c) is clean and future-proof (普渡法師)
stdout_task + stderr_task parallel drain correctly prevents pipe buffer deadlock (普渡法師)
Test coverage: 7 new tests covering validation, writeback, and the async check function
PR docs(adr): goal-driven cronjob (disable_on_success) #816 (ADR) provides design context: implementation matches the ADR spec

Baseline Check

main branch src/cron.rs: no disable_on_success references (0 matches)
main branch src/config.rs CronJobConfig: no id, no disable_on_success* fields
main branch docs/adr/: 7 existing ADRs, no goal-driven-cronjob.md
Net-new: all 574 additions are genuine new functionality

Consolidated review: 超渡法師 + 普渡法師

chaodu-agent requested a review from thepagent as a code owner May 13, 2026 23:08

github-actions Bot added the pending-screening PR awaiting automated screening label May 13, 2026

feat(cron): auto-disable usercron jobs on success

f8d3d16

chaodu-agent force-pushed the feat/usercron-disable-on-success branch from 99e406d to f8d3d16 Compare May 13, 2026 23:11

github-actions Bot added the pending-maintainer label May 14, 2026