Skip to content

feat(sec-254): vsock CID allocation and end-to-end tests#6

Draft
jasonhernandez wants to merge 14 commits intoaljoscha:mainfrom
jasonhernandez:feat/sec-254-vsock-cid-allocation
Draft

feat(sec-254): vsock CID allocation and end-to-end tests#6
jasonhernandez wants to merge 14 commits intoaljoscha:mainfrom
jasonhernandez:feat/sec-254-vsock-cid-allocation

Conversation

@jasonhernandez
Copy link
Copy Markdown
Collaborator

@jasonhernandez jasonhernandez commented Apr 10, 2026

Summary

Stacked on #5.

  • Replace hardcoded guest_cid=3 with a proper CID allocator (state/vsock.rs) that assigns unique CIDs per VM, persisted in vsock/cids.json, freed on delete, reused lowest-first
  • Validate UDS path length against macOS sun_path limit (104 bytes) with actionable error before allocating resources
  • Add emberd in-VM daemon for structured RPC over vsock (replaces ad-hoc SSH commands)
  • vsock-first exec: prefer vsock/emberd over SSH when available
  • vm create --format json with progress to stderr
  • Wait for SSH readiness after vm create
  • Various fixes: vsock bridge data forwarding on macOS, VM cleanup on failed start, emberd cross-compilation

Files changed (key additions over #5)

File What
crates/ember-core/src/state/vsock.rs CID allocator — allocate/release/list, persisted to vsock/cids.json
crates/ember-core/src/state/store.rs vsock_allocations_path() method
emberd/ In-VM daemon (Rust): ping, exec, read/write file, agent status
src/cli/exec.rs vsock-first exec, UDS JSON-lines to emberd
src/cli/vm.rs CID allocation wired into create/fork/delete, UDS validation, --format json
ember-vz/Sources/EmberVZ/Start.swift vsock bridge fix for macOS
tests/vsock.rs 6 integration tests (CID uniqueness/reuse, inspect, UDS connectivity)
images/ Dockerfiles + systemd unit for emberd

Rebased onto main after the ember-core/ember-linux/ember-macos workspace restructuring.

Test plan

  • 29 unit tests pass (cargo test --workspace) — 9 CID allocator + 3 UDS validation
  • cargo build clean, cargo clippy --workspace clean, cargo fmt clean
  • Run cargo test --test vsock -- --ignored on macOS with ember-vz built
  • Run cargo test --test vsock -- --ignored on Linux with Firecracker + KVM

🤖 Generated with Claude Code

jasonhernandez and others added 14 commits April 14, 2026 16:09
Add vsock device support across both Firecracker (Linux) and AVF (macOS)
backends, enabling structured host↔guest communication over a Unix domain
socket instead of SSH polling.

CLI: `ember vm create myvm --image base --vsock`
YAML config: `vsock: true`
UDS created at: `<state_dir>/vms/<name>/vsock.sock`

Linux (Firecracker):
- New `PUT /vsock` API call with guest CID and UDS path
- Firecracker natively creates the UDS and bridges to guest AF_VSOCK

macOS (AVF):
- VZVirtioSocketDeviceConfiguration added to VM config
- ember-vz implements a UDS bridge: accepts host connections on the UDS
  and proxies them to guest vsock port 1024, and accepts guest-initiated
  connections on port 1024 and bridges them back to the UDS

Both platforms expose the same UDS interface — Thermite's code path is
identical regardless of the underlying hypervisor.

Co-Authored-By: Claude <noreply@anthropic.com>
ember vm stop --all          # stop all running VMs
ember vm stop --all --force  # SIGKILL all running VMs
ember vm delete --all --force # stop + delete every VM

Useful for cleanup and for ending all VMs (including non-pool
control agent VMs that pool destroy doesn't touch).

Co-Authored-By: Claude <noreply@anthropic.com>
Lightweight Rust daemon that runs inside Ember VMs and serves the
JSON-lines protocol expected by Thermite's EmberdClient. Listens on
vsock port 1024 (Linux) or a Unix domain socket (--uds, for testing).

Operations: ping, exec, read_file, write_file, agent_status.

- New `emberd/` workspace member with minimal dependencies
- 15 unit + integration tests (all via UDS on any platform)
- Makefile targets: `make emberd`, `make emberd-release`
- Workspace fmt/check/clippy/test updated to include emberd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Protocol reference, build instructions, architecture diagram, and
image integration guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add emberd binary and systemd service to both ubuntu-dev and
ubuntu-slim Dockerfiles. The binary is pre-built on the host with
`make emberd-image` and staged at images/emberd for COPY.

- images/emberd.service: systemd unit (Type=simple, Restart=always)
- Dockerfile.ubuntu-dev: COPY emberd + enable service
- Dockerfile.ubuntu-slim: COPY emberd + enable service
- Makefile: `make emberd-image` target (native on Linux, cross-compile on macOS)
- .gitignore: exclude staged binary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show build, pull, and list commands instead of only suggesting pull.
Most custom images (ubuntu-dev, ubuntu-slim) need to be built from
a Dockerfile, not pulled from a registry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Backlog type in listen_vsock (nix 0.29 on Linux requires
  Backlog::new() instead of raw integer)
- Makefile emberd-image: use Docker (rust:latest) for Linux builds
  on macOS instead of requiring cross-compilation toolchain
- Dockerfiles: clean up emberd COPY comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If `ember vm create` succeeds but the subsequent start fails (e.g.,
ember-vz crash, missing binary), delete the created VM instead of
leaving orphaned state behind. Previously, the start rollback only
cleaned up network/process but left the VM metadata and disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs in the ember-vz vsock UDS bridge:

1. VZVirtioSocketDevice.connect(toPort:) was called from a background
   queue, but AVF requires it on the main queue. The completion handler
   never fired, so host→guest connections silently failed.

2. VZVirtioSocketConnection was not retained during bridgeConnection(),
   so ARC could deallocate it and close the fd mid-transfer.

Fixes:
- Dispatch connect(toPort:) to DispatchQueue.main
- Hold strong ref to VZVirtioSocketConnection via DispatchGroup
- Log ember-vz stderr to <vm_dir>/ember-vz.log for debugging
- Add diagnostic logging throughout the bridge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`ember vm create` now waits up to 90s (configurable via --wait) for
SSH to become reachable before reporting success. This means `ember
exec` works immediately after create — no manual polling needed.

Also add --wait flag to `ember exec` for configuring the SSH connect
timeout (default: 30s, can be increased for heavy images).

If the wait times out, the VM is still running — just SSH is slow.
A hint is printed suggesting `ember exec --wait`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `ember exec vm -- "echo hi | tee /tmp/out"` has one argument
after `--`, pass it directly to the SSH channel without quoting.
The remote shell interprets pipes and redirects correctly.

Previously, shell_escape_join would single-quote arguments containing
`|` or `>`, preventing the remote shell from interpreting them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ember exec now tries vsock (emberd) first, falling back to SSH:
- Connects to the VM's vsock UDS and sends JSON-lines exec request
- No SSH dependency — works immediately after boot (emberd starts fast)
- Falls back to SSH automatically if vsock fails
- --ssh flag to force SSH path

ember vm list now shows IP address and vsock status:
  NAME         STATUS   IP             VSOCK  CPUS   MEM   DISK
  val-smoke    running  192.168.64.2   ✓      1    16 GiB  8 GiB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- `ember vm create --format json` returns VM metadata as JSON on stdout
- All progress messages (Cloning, Growing, Injecting, Starting, Waiting)
  now go to stderr so stdout is clean for JSON piping
- `ember exec` also reformatted by cargo fmt

This makes ember scriptable: `ember vm create foo --image bar --format json | jq .`
outputs clean JSON while progress is visible on stderr.

201 tests pass (186 ember + 15 emberd), clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tests

Replace hardcoded guest_cid=3 with a proper CID allocator that assigns
unique CIDs (starting at 3) per VM, persisted in vsock/cids.json. CIDs
are freed on VM delete and reused lowest-first, following the same pattern
as IP allocation in network/ip.rs.

- state/vsock.rs: allocate()/release() with flock-based locking (6 tests)
- cli/vm.rs: create and fork use CID allocator; delete releases CIDs
- cli/vm.rs: validate_uds_path() rejects paths >= 104 bytes (macOS sun_path
  limit) with actionable error message (3 tests)
- error.rs: Error::Vsock variant for CID allocation failures
- state/store.rs: vsock_allocations_path(), vsock/ dir in init()
- tests/vsock.rs: 6 integration tests (CID uniqueness, reuse after delete,
  inspect JSON/table output, vm list checkmark, end-to-end UDS connectivity
  on macOS and Linux)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jasonhernandez jasonhernandez force-pushed the feat/sec-254-vsock-cid-allocation branch from 276757c to 4afb65f Compare April 14, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant