Skip to content

Releases: parallelworks/ray-cluster

v1.0.0 — Multi-Site Ray Cluster

07 Mar 08:53
ccfe3ff

Choose a tag to compare

Multi-Site Ray Cluster v1.0.0

First stable release of the multi-site Ray cluster workflow for ACTIVATE.

Features

  • N-site Ray cluster — Deploy a Ray head node on any resource, connect workers from multiple sites via SSH tunnels
  • 3 workload modes — Fractal rendering, mathematical benchmark, and cluster-only (bring your own workload)
  • Live dashboard — Real-time cluster topology, task placement, throughput charts via WebSocket
  • Cluster-only mode — Deploy the cluster with no demo workload; Connect tab shows copy-paste-ready SSH tunnel commands with real IPs
  • SLURM support — Submit workers via srun with configurable partition, account, QoS, nodes, and walltime
  • SSH worker dispatch — Direct connection for non-scheduler resources
  • Zero-dependency setup — Bootstraps modern Python via uv on old HPC systems (no root, no containers)
  • 1 worker per node — Each node registers 1 Ray task slot; tasks use internal parallelism (OpenMP, MPI, PyTorch)
  • Ray Dashboard proxy — Native Ray dashboard accessible through the session proxy

Architecture

  • Head node runs Ray coordinator (--num-cpus=0) + custom FastAPI dashboard
  • Workers connect via SSH tunnels with unique loopback IPs (127.0.X.Y) for multi-node support
  • Dashboard proxies to Ray's native dashboard on port 8265

What's Next

See ROADMAP.md for planned improvements including PBS support, GPU awareness, and custom user script execution.