Skip to content

[wip][nightly] RAPIDS 26.08* / Ray 3* / Dynamo 1.3* + bump transformers 5 + data-designer 0.61#2065

Draft
praateekmahajan wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
praateekmahajan:nightly
Draft

[wip][nightly] RAPIDS 26.08* / Ray 3* / Dynamo 1.3* + bump transformers 5 + data-designer 0.61#2065
praateekmahajan wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
praateekmahajan:nightly

Conversation

@praateekmahajan

@praateekmahajan praateekmahajan commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Tracjs dynamo/ray/RAPIDS nightly along with transformers and data-designer at their newest releases, so a weekly benchmark surfaces upstream breakage early. Migrates Curator to the new APIs and clears accumulated CVE constraint/override tech-debt.

Dependencies (pyproject.toml, uv.lock):

  • RAPIDS cudf/cuml/cugraph/raft/rmm/rapidsmpf -> 26.08 nightly (a*) from the rapids-nightly index; transitive nightly libs listed explicitly so prerelease="if-necessary-or-explicit" stays scoped (no stray PyPI betas).
  • Add cudf-streaming-cu12 (partition_and_pack/unpack_and_concat moved here out of rapidsmpf in 26.08).
  • transformers>=5,<6 override (defeats nemo-toolkit[asr]'s 4.57 pin), huggingface-hub>=1.5,<2, packaging>=25, pandas>=3 overrides.
  • data-designer 0.5.5 -> 0.6.1.
  • Drop the huggingface-hub<1.0 override and the numpy<=2.2 / protobuf<7 caps.
  • Remove all 12 CVE constraint floors (verified redundant: the nightly stack already resolves at/above every CVE fix).

transformers 5:

  • batch_encode_plus -> call (text/models/tokenizer.py, text/embedders/vllm.py, text/io/writer/megatron_tokenizer.py).
  • data_designer: add deepcopy so Xenna pipeline_spec deepcopy survives hf-hub>=1.0 caching an unpickleable DuckDBPyConnection.

cuml 26.08:

  • semantic dedup KMeans -> cuml.cluster.kmeans_mg.KMeansMG (single-GPU KMeans dropped handle=; private _fit(multigpu=True) removed -> KMeansMG.fit()).

rapidsmpf 26.08 (deduplication/shuffle_utils/rapidsmpf_shuffler.py):

  • imports -> memory.{buffer,buffer_resource,spill} + integrations.ray.RapidsMPFActor.
  • BufferResource(memory_limits={DEVICE:int}, statistics=...); Statistics(enable=) (dropped mr); direct Shuffler(comm,0,nparts,br); insert_finished() once; wait()+local_partitions(); inline cudf<->pylibcudf helpers (utils.cudf removed), re-exported and repointed lsh.py.

cugraph 26.08:

  • connected_components: symmetrize=False -> True (cugraph honors the flag literally; the one-directional dedup edge-list must be symmetrized).

Tests: test_kmeans _fit->fit mock; test_minhash values_host->to_numpy.

Build/test via main docker/Dockerfile (CURATOR_EXTRA=all --all-groups); full pytest cpu+gpu together.

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

….6.1

Extends the dynamo/ray/vLLM-cu129 nightly baseline to also track RAPIDS,
transformers and data-designer at their newest releases, so a weekly
benchmark surfaces upstream breakage early. Migrates Curator to the new
APIs and clears accumulated CVE constraint/override tech-debt.

Dependencies (pyproject.toml, uv.lock):
- RAPIDS cudf/cuml/cugraph/raft/rmm/rapidsmpf -> 26.08 nightly (a*) from the
  rapids-nightly index; transitive nightly libs listed explicitly so
  prerelease="if-necessary-or-explicit" stays scoped (no stray PyPI betas).
- Add cudf-streaming-cu12 (partition_and_pack/unpack_and_concat moved here out
  of rapidsmpf in 26.08).
- transformers>=5,<6 override (defeats nemo-toolkit[asr]'s 4.57 pin),
  huggingface-hub>=1.5,<2, packaging>=25, pandas>=3 overrides.
- data-designer 0.5.5 -> 0.6.1.
- Drop the huggingface-hub<1.0 override and the numpy<=2.2 / protobuf<7 caps.
- Remove all 12 CVE constraint floors (verified redundant: the nightly stack
  already resolves at/above every CVE fix).

transformers 5:
- batch_encode_plus -> __call__ (text/models/tokenizer.py,
  text/embedders/vllm.py, text/io/writer/megatron_tokenizer.py).
- data_designer: add __deepcopy__ so Xenna pipeline_spec deepcopy survives
  hf-hub>=1.0 caching an unpickleable DuckDBPyConnection.

cuml 26.08:
- semantic dedup KMeans -> cuml.cluster.kmeans_mg.KMeansMG (single-GPU KMeans
  dropped handle=; private _fit(multigpu=True) removed -> KMeansMG.fit()).

rapidsmpf 26.08 (deduplication/shuffle_utils/rapidsmpf_shuffler.py):
- imports -> memory.{buffer,buffer_resource,spill} + integrations.ray.RapidsMPFActor.
- BufferResource(memory_limits={DEVICE:int}, statistics=...); Statistics(enable=)
  (dropped mr); direct Shuffler(comm,0,nparts,br); insert_finished() once;
  wait()+local_partitions(); inline cudf<->pylibcudf helpers (utils.cudf removed),
  re-exported and repointed lsh.py.

cugraph 26.08:
- connected_components: symmetrize=False -> True (cugraph honors the flag
  literally; the one-directional dedup edge-list must be symmetrized).

Tests: test_kmeans _fit->fit mock; test_minhash values_host->to_numpy.

Build/test via main docker/Dockerfile (CURATOR_EXTRA=all --all-groups);
full pytest cpu+gpu together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Praateek <praateekm@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…refresh nightlies

Follow-up to the nightly bump.

docker/Dockerfile:
- Stub ray/dashboard/client/build (its own layer, after uv sync). The ray nightly
  wheel omits the prebuilt dashboard frontend, so the dashboard process died with
  FrontendNotFoundError and its HTTP/API server never registered — breaking every
  ray.util.state call (cosmos-xenna uses it) with "Could not read 'dashboard' from
  GCS". This was blocking ALL xenna pipeline e2e tests (semantic dedup,
  data-designer, nemotron-cc NDD). No-op on stable wheels that ship client/build.

data_designer:
- Add __getstate__/__setstate__ (mirror of the __deepcopy__ added in the bump) so
  Ray can pickle the stage to its actors. The live DataDesigner caches an
  unpickleable duckdb.DuckDBPyConnection under hf-hub>=1.0; rebuild it on unpickle
  via __post_init__. Synthetic/NDD suite is green (70/70) with this + the dashboard fix.

pyproject.toml (from all-extras-cu129):
- Route the ray nightly wheel via [tool.uv.sources] per (python, arch) for
  cp311/12/13 instead of an inline URL in dependencies; the dependency stays a clean
  ray[default,data]>=2.55.1 (PyPI fallback for non-x86_64).

uv.lock:
- Re-locked with the ray-source change plus a targeted refresh of the nightly
  packages only (cudf a633->a634, libcudf->a635, rapidsmpf a37->a38, ...). ai-dynamo
  held at dev20260608: its latest nightlies require an exact ai-dynamo-runtime==<same>
  that isn't published, so the refresh upgrades ai-dynamo only and lets uv backtrack
  to the latest consistent pair.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Praateek <praateekm@gmail.com>
@praateekmahajan

Copy link
Copy Markdown
Contributor Author

/ok to test faf4108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant