[WIP] feat: suspendable jobs + on-demand job runner cmd#756
Open
prathamesh0 wants to merge 18 commits into
Open
[WIP] feat: suspendable jobs + on-demand job runner cmd#756prathamesh0 wants to merge 18 commits into
prathamesh0 wants to merge 18 commits into
Conversation
Adds the laconic.suspend compose label so stack authors can mark individual jobs as not-auto-run at deployment start. Extends the existing run-job subcommand with --env, blocking log-stream, and timestamp-suffixed Job names so it can be invoked repeatedly from ansible playbooks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12-task plan: _is_suspended helper, get_jobs name_suffix + extra_env kwargs, _create_jobs filter, _wait_and_stream helper, K8sDeployer .run_job rewrite, abstract signature update, docker-compose deployer parity (warn/raise), --env parsing, Click flags, docs/cli.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a compose service in a job file carries laconic.suspend=true, get_jobs() now propagates that as laconic.suspend: "true" on the V1Job's metadata.labels. Labels with false or missing values are not stamped. Adds _make_cluster_info() helper (module-level, reusable by later tasks) and TestGetJobsSuspendLabel (3 tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an optional extra_env: Dict[str, str] kwarg to ClusterInfo.get_jobs(). After _build_containers() builds the container list, any entries in extra_env are applied on top: existing vars with the same name are dropped and the override value is appended, so each name appears exactly once. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jobs whose compose service carries `laconic.suspend=true` are now skipped during `deploy start`; they remain available for on-demand execution via `run-job`. Three unit tests cover the skip, all-clear, and all-suspended paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds K8sDeployer._wait_and_stream(job_name, timeout_seconds) which polls for the job's pod to leave Pending, streams its logs to stdout via urllib3's raw response, then returns 0 on Job success or 1 on failure. Two unit tests cover the success and failure paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites run_job to accept extra_env, no_wait, and timeout_seconds kwargs, generate a timestamp-based name suffix via get_jobs(), warn when a job lacks laconic.suspend=true, and delegate to _wait_and_stream unless no_wait is set. Also promotes import sys/time to module level from _wait_and_stream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The base-name fallback in run_job was dead production code added only to satisfy test mocks that used unsuffixed job names. Fix the mocks to carry the correct suffixed names (matching what production builds), then revert the match condition to target_name-only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…assertion Helm path now raises DeployerException for --no-wait and --timeout flags, mirroring the existing guard for --env. Test assertion for _wait_and_stream pins both the job name (suffix flow-through) and timeout_seconds so regressions in target_name construction fail loudly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
laconic.suspend: "true"is skipped bydeployment start. It's only created when an operator explicitly runsrun-job <name>run-jobis now repeatable: each invocation creates a fresh Job named{app}-job-{name}-{unix-ts}, so back-to-back calls don't collide--env KEY=VAL(repeatable) layers per-invocation env vars on top of compose/spec env;--no-waitreturns as soon as the Job is accepted;--timeout SECONDSbounds the waitrun-jobnow blocks streaming the pod's logs to stdout and exits with the Job's status, instead of returning immediately. Documented indocs/cli.md