Skip to content

[WIP] feat: suspendable jobs + on-demand job runner cmd#756

Open
prathamesh0 wants to merge 18 commits into
mainfrom
job-suspend-and-run-job
Open

[WIP] feat: suspendable jobs + on-demand job runner cmd#756
prathamesh0 wants to merge 18 commits into
mainfrom
job-suspend-and-run-job

Conversation

@prathamesh0

Copy link
Copy Markdown
Collaborator
  • A compose service labeled laconic.suspend: "true" is skipped by deployment start. It's only created when an operator explicitly runs run-job <name>
  • run-job is now repeatable: each invocation creates a fresh Job named {app}-job-{name}-{unix-ts}, so back-to-back calls don't collide
  • New flags: --env KEY=VAL (repeatable) layers per-invocation env vars on top of compose/spec env; --no-wait returns as soon as the Job is accepted; --timeout SECONDS bounds the wait
  • Default behavior changed: run-job now blocks streaming the pod's logs to stdout and exits with the Job's status, instead of returning immediately. Documented in docs/cli.md
  • Helm and docker-compose paths accept the new flags but raise or warn since they don't support the model end-to-end

prathamesh0 and others added 18 commits May 28, 2026 11:25
Adds the laconic.suspend compose label so stack authors can mark
individual jobs as not-auto-run at deployment start. Extends the
existing run-job subcommand with --env, blocking log-stream, and
timestamp-suffixed Job names so it can be invoked repeatedly from
ansible playbooks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12-task plan: _is_suspended helper, get_jobs name_suffix + extra_env
kwargs, _create_jobs filter, _wait_and_stream helper, K8sDeployer
.run_job rewrite, abstract signature update, docker-compose deployer
parity (warn/raise), --env parsing, Click flags, docs/cli.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a compose service in a job file carries laconic.suspend=true,
get_jobs() now propagates that as laconic.suspend: "true" on the
V1Job's metadata.labels. Labels with false or missing values are not
stamped. Adds _make_cluster_info() helper (module-level, reusable by
later tasks) and TestGetJobsSuspendLabel (3 tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an optional extra_env: Dict[str, str] kwarg to ClusterInfo.get_jobs().
After _build_containers() builds the container list, any entries in extra_env
are applied on top: existing vars with the same name are dropped and the
override value is appended, so each name appears exactly once.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jobs whose compose service carries `laconic.suspend=true` are now skipped
during `deploy start`; they remain available for on-demand execution via
`run-job`. Three unit tests cover the skip, all-clear, and all-suspended paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds K8sDeployer._wait_and_stream(job_name, timeout_seconds) which polls
for the job's pod to leave Pending, streams its logs to stdout via
urllib3's raw response, then returns 0 on Job success or 1 on failure.
Two unit tests cover the success and failure paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites run_job to accept extra_env, no_wait, and timeout_seconds kwargs,
generate a timestamp-based name suffix via get_jobs(), warn when a job lacks
laconic.suspend=true, and delegate to _wait_and_stream unless no_wait is set.
Also promotes import sys/time to module level from _wait_and_stream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The base-name fallback in run_job was dead production code added only to
satisfy test mocks that used unsuffixed job names. Fix the mocks to carry
the correct suffixed names (matching what production builds), then revert
the match condition to target_name-only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…assertion

Helm path now raises DeployerException for --no-wait and --timeout flags,
mirroring the existing guard for --env. Test assertion for
_wait_and_stream pins both the job name (suffix flow-through) and
timeout_seconds so regressions in target_name construction fail loudly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant