Skip to content

feat(workstation): runt workstation connect/run/status and runtimed workstation-agent loop#7

Closed
quillaid wants to merge 2 commits into
ws-pairing-workerfrom
ws-workstation-cli
Closed

feat(workstation): runt workstation connect/run/status and runtimed workstation-agent loop#7
quillaid wants to merge 2 commits into
ws-pairing-workerfrom
ws-workstation-cli

Conversation

@quillaid

Copy link
Copy Markdown
Owner

Summary

The operator side of the pairing contract (nteract#3609): a single static-binary path from pairing code to serving compute, replacing the .mjs connector's eight env vars.

runt workstation connect https://preview.runt.run --code XXXX-XXXX-XXXX
runt workstation run
  • runt workstation connect — prompts for/accepts the code, redeems it, stores the nwc_ credential at the session_state_path()-style location (0600), and does one registration POST so the workstation appears in the panel immediately. A used/expired code gets a clear message and exit 1.
  • runt workstation run — resolves the sibling runtimed (bundled lookup + dev target/ fallback) and launches the agent with the token in env, never argv.
  • runtimed workstation-agent — the long-lived loop, living in runtimed to match the runt-launches-runtimed split: heartbeat registration, attach-job polling, one cloud-runtime-agent runtime peer per job (--auth-kind workstation → new honestly-labeled CloudAuth::WorkstationCredential, plain bearer on the WS dial), readiness-line detection, rate-limit cooldowns with jitter, and full job adoption across restarts via per-job log + pid files.
  • runt workstation status [--json] — registered workstations through the stored credential.
  • docs/remote-workstation.md now leads with mint → connect → run; the env-var/.mjs path stays documented as the dev/legacy alternative.

Verification

runtimed lib 969 passed (+ tokio mutex lint), runt 18 (8 new), notebook-cloud-transport 25 (2 new), runt-workspace 26 (3 new). Plus a live smoke against a mock cloud server: bad code → exit 1; good code → 0600 credential + registration; run → register → accept job → spawn real cloud-runtime-agent → status PATCHes through failed on peer exit.

Stack

Base: ws-pairing-worker (nteract#3609). Intra-fork; promote to nteract:main after nteract#3609 merges. The panel one-liner in #6 invokes exactly this CLI.

quillaid added 2 commits June 12, 2026 06:46
…orkstation-agent loop

Rust replacement for apps/notebook-cloud/scripts/hosted-workstation-agent.mjs
on top of the pairing contract (hosted-credential-transport ADR, Decision 9).

Operator path:
- `runt workstation connect <url> [--code]` redeems a panel-minted pairing
  code (404 -> clear "invalid, expired, or already used" + nonzero exit),
  stores {cloud_url, token, credential_id, workstation_id, display_name,
  connected_at} at workstation.json (config dir; per-worktree in dev; 0600),
  and does one registration POST so the panel shows the workstation
  immediately.
- `runt workstation run` resolves the credential (RUNT_CLOUD_TOKEN /
  RUNT_CLOUD_URL env overrides win), finds the sibling runtimed (bundled
  lookup + dev target/ fallback), and execs `runtimed workstation-agent`
  with the token in the environment, never argv.
- `runt workstation status [--json]` lists workstations via
  GET /api/workstations with the stored bearer.

Service loop (`runtimed workstation-agent`):
- Heartbeats POST /api/workstations with the same payload field set as the
  .mjs core (cpu_count via available_parallelism, memory_bytes via
  /proc/meminfo on Linux, omitted elsewhere).
- Polls attach jobs; pending -> PATCH accepted -> spawn
  `cloud-runtime-agent` via std::env::current_exe with the .mjs argv plan
  plus `--auth-kind workstation` -> PATCH running on the
  "Infrastructure ready, entering main loop" log line -> PATCH
  completed/failed on exit. Rate-limit cooldowns (Retry-After + exponential
  backoff + jitter) ported from the .mjs core.
- Job adoption ported: peers write to per-job log files (not pipes, so an
  orphaned peer never blocks on a dead reader) plus 0600 pid files; on
  restart the agent re-attaches to accepted/running jobs whose pid is alive
  and fails the rest. The server stale-job sweep remains the backstop
  (and the only path on non-unix, where there is no cheap pid probe).
- Owned-state single loop (no shared mutexes), per the tokio-mutex invariant.

Auth plumbing: CloudAuthKind::Workstation (runtimed CLI) maps to a new
CloudAuth::WorkstationCredential in notebook-cloud-transport - plain
`Authorization: Bearer` on the wire, same shape as OidcBearer but honestly
labeled; blob publisher handles the new variant the same way.

Deviations from the .mjs source, by spec: --display-name defaults to the
bare hostname (not "<hostname> workstation"), and the default interpreter is
the first python3/python on PATH without the ipykernel import probe
(pass --python-path to choose explicitly).

docs/remote-workstation.md now leads with the pairing flow; the per-notebook
env-var path and the Node connector stay documented as legacy/dev.
@quillaid quillaid force-pushed the ws-workstation-cli branch from 5ce5704 to 3c4de45 Compare June 12, 2026 12:50
@quillaid

Copy link
Copy Markdown
Owner Author

Promoted to nteract#3611 after nteract#3609 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

daemon documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant