Releases: parallelworks/ray-cluster
Releases · parallelworks/ray-cluster
v1.0.0 — Multi-Site Ray Cluster
Multi-Site Ray Cluster v1.0.0
First stable release of the multi-site Ray cluster workflow for ACTIVATE.
Features
- N-site Ray cluster — Deploy a Ray head node on any resource, connect workers from multiple sites via SSH tunnels
- 3 workload modes — Fractal rendering, mathematical benchmark, and cluster-only (bring your own workload)
- Live dashboard — Real-time cluster topology, task placement, throughput charts via WebSocket
- Cluster-only mode — Deploy the cluster with no demo workload; Connect tab shows copy-paste-ready SSH tunnel commands with real IPs
- SLURM support — Submit workers via srun with configurable partition, account, QoS, nodes, and walltime
- SSH worker dispatch — Direct connection for non-scheduler resources
- Zero-dependency setup — Bootstraps modern Python via uv on old HPC systems (no root, no containers)
- 1 worker per node — Each node registers 1 Ray task slot; tasks use internal parallelism (OpenMP, MPI, PyTorch)
- Ray Dashboard proxy — Native Ray dashboard accessible through the session proxy
Architecture
- Head node runs Ray coordinator (
--num-cpus=0) + custom FastAPI dashboard - Workers connect via SSH tunnels with unique loopback IPs (
127.0.X.Y) for multi-node support - Dashboard proxies to Ray's native dashboard on port 8265
What's Next
See ROADMAP.md for planned improvements including PBS support, GPU awareness, and custom user script execution.