feat(workstation): runt workstation connect/run/status and runtimed workstation-agent loop#7
Closed
quillaid wants to merge 2 commits into
Closed
feat(workstation): runt workstation connect/run/status and runtimed workstation-agent loop#7quillaid wants to merge 2 commits into
quillaid wants to merge 2 commits into
Conversation
…orkstation-agent loop
Rust replacement for apps/notebook-cloud/scripts/hosted-workstation-agent.mjs
on top of the pairing contract (hosted-credential-transport ADR, Decision 9).
Operator path:
- `runt workstation connect <url> [--code]` redeems a panel-minted pairing
code (404 -> clear "invalid, expired, or already used" + nonzero exit),
stores {cloud_url, token, credential_id, workstation_id, display_name,
connected_at} at workstation.json (config dir; per-worktree in dev; 0600),
and does one registration POST so the panel shows the workstation
immediately.
- `runt workstation run` resolves the credential (RUNT_CLOUD_TOKEN /
RUNT_CLOUD_URL env overrides win), finds the sibling runtimed (bundled
lookup + dev target/ fallback), and execs `runtimed workstation-agent`
with the token in the environment, never argv.
- `runt workstation status [--json]` lists workstations via
GET /api/workstations with the stored bearer.
Service loop (`runtimed workstation-agent`):
- Heartbeats POST /api/workstations with the same payload field set as the
.mjs core (cpu_count via available_parallelism, memory_bytes via
/proc/meminfo on Linux, omitted elsewhere).
- Polls attach jobs; pending -> PATCH accepted -> spawn
`cloud-runtime-agent` via std::env::current_exe with the .mjs argv plan
plus `--auth-kind workstation` -> PATCH running on the
"Infrastructure ready, entering main loop" log line -> PATCH
completed/failed on exit. Rate-limit cooldowns (Retry-After + exponential
backoff + jitter) ported from the .mjs core.
- Job adoption ported: peers write to per-job log files (not pipes, so an
orphaned peer never blocks on a dead reader) plus 0600 pid files; on
restart the agent re-attaches to accepted/running jobs whose pid is alive
and fails the rest. The server stale-job sweep remains the backstop
(and the only path on non-unix, where there is no cheap pid probe).
- Owned-state single loop (no shared mutexes), per the tokio-mutex invariant.
Auth plumbing: CloudAuthKind::Workstation (runtimed CLI) maps to a new
CloudAuth::WorkstationCredential in notebook-cloud-transport - plain
`Authorization: Bearer` on the wire, same shape as OidcBearer but honestly
labeled; blob publisher handles the new variant the same way.
Deviations from the .mjs source, by spec: --display-name defaults to the
bare hostname (not "<hostname> workstation"), and the default interpreter is
the first python3/python on PATH without the ipykernel import probe
(pass --python-path to choose explicitly).
docs/remote-workstation.md now leads with the pairing flow; the per-notebook
env-var path and the Node connector stay documented as legacy/dev.
5ce5704 to
3c4de45
Compare
Owner
Author
|
Promoted to nteract#3611 after nteract#3609 merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The operator side of the pairing contract (nteract#3609): a single static-binary path from pairing code to serving compute, replacing the
.mjsconnector's eight env vars.runt workstation connect— prompts for/accepts the code, redeems it, stores thenwc_credential at thesession_state_path()-style location (0600), and does one registration POST so the workstation appears in the panel immediately. A used/expired code gets a clear message and exit 1.runt workstation run— resolves the siblingruntimed(bundled lookup + devtarget/fallback) and launches the agent with the token in env, never argv.runtimed workstation-agent— the long-lived loop, living inruntimedto match therunt-launches-runtimedsplit: heartbeat registration, attach-job polling, onecloud-runtime-agentruntime peer per job (--auth-kind workstation→ new honestly-labeledCloudAuth::WorkstationCredential, plain bearer on the WS dial), readiness-line detection, rate-limit cooldowns with jitter, and full job adoption across restarts via per-job log + pid files.runt workstation status [--json]— registered workstations through the stored credential.docs/remote-workstation.mdnow leads with mint → connect → run; the env-var/.mjspath stays documented as the dev/legacy alternative.Verification
runtimedlib 969 passed (+ tokio mutex lint),runt18 (8 new),notebook-cloud-transport25 (2 new),runt-workspace26 (3 new). Plus a live smoke against a mock cloud server: bad code → exit 1; good code → 0600 credential + registration;run→ register → accept job → spawn realcloud-runtime-agent→ status PATCHes through failed on peer exit.Stack
Base:
ws-pairing-worker(nteract#3609). Intra-fork; promote tonteract:mainafter nteract#3609 merges. The panel one-liner in #6 invokes exactly this CLI.