Skip to content

Minor updates to CLIMB tutorial#2072

Merged
sarahyurick merged 5 commits into
NVIDIA-NeMo:mainfrom
sarahyurick:climb_nvbug
Jun 15, 2026
Merged

Minor updates to CLIMB tutorial#2072
sarahyurick merged 5 commits into
NVIDIA-NeMo:mainfrom
sarahyurick:climb_nvbug

Conversation

@sarahyurick

Copy link
Copy Markdown
Contributor

Fixes get_deterministic_hash import and edits utils.py to not use top-level Curator imports.

Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
@sarahyurick sarahyurick requested a review from a team as a code owner June 11, 2026 21:08
@sarahyurick sarahyurick requested review from suiyoubi and removed request for a team June 11, 2026 21:08
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR makes three small corrections to the Nemotron-CLIMB tutorial: it fixes the broken get_deterministic_hash import in 3_prune.py, refactors utils.py to defer heavy nemo_curator imports so the module can be loaded without a full Curator install, and corrects the README title capitalisation.

  • 3_prune.py: removes the unreachable nemo_curator.stages.text.io.writer.utils path and adds the canonical from nemo_curator.utils.hash_utils import get_deterministic_hash; the function's signature (list[str], str) is compatible with the call site.
  • utils.py: moves nemo_curator.core.constants imports inside attach_ray_client_args and RayClient import inside create_ray_client, guarding the type annotation with TYPE_CHECKING; type checkers that honour TYPE_CHECKING (mypy, pyright) will resolve "RayClient" correctly.
  • README.md: title fixed from "Nemotron-Climb" to "Nemotron-CLIMB".

Confidence Score: 5/5

All three changes are straightforward corrections with no functional risk — the import is now valid and points to the confirmed location of get_deterministic_hash, and the deferred imports in utils.py follow standard Python patterns.

The changes fix a broken import, defer heavy optional imports to avoid eager loading, and correct a title typo. The function being imported exists at the referenced path with a compatible signature, and the TYPE_CHECKING pattern used in utils.py is well-understood by the relevant type checkers.

No files require special attention.

Important Files Changed

Filename Overview
tutorials/text/nemotron-climb-data-curation/3_prune.py Replaced invalid module-attribute import path for get_deterministic_hash with a correct direct import from nemo_curator.utils.hash_utils; call site updated accordingly. Signature is compatible.
tutorials/text/nemotron-climb-data-curation/utils.py Top-level RayClient and constants imports replaced with TYPE_CHECKING-guarded annotation and lazy function-level imports; the string annotation 'RayClient' correctly resolves for type checkers that honour TYPE_CHECKING blocks.
tutorials/text/nemotron-climb-data-curation/README.md Title capitalisation corrected from 'Nemotron-Climb' to 'Nemotron-CLIMB' to match the paper acronym.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[3_prune.py] -->|"from nemo_curator.utils.hash_utils import get_deterministic_hash"| B[hash_utils.py]
    A -->|imports| C[utils.py]
    C -->|TYPE_CHECKING only| D["nemo_curator.core.client.RayClient (type annotation)"]
    C -->|lazy import inside attach_ray_client_args| E[nemo_curator.core.constants]
    C -->|lazy import inside create_ray_client| F[nemo_curator.core.client.RayClient]
    A -->|uses| G["get_deterministic_hash(source_files, task.task_id)"]
    B --> G
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into climb_nvbug" | Re-trigger Greptile

Comment thread tutorials/text/nemotron-climb-data-curation/3_prune.py Outdated
Comment thread tutorials/text/nemotron-climb-data-curation/utils.py Outdated
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>

@VibhuJawa VibhuJawa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarahyurick sarahyurick enabled auto-merge (squash) June 15, 2026 02:38
@sarahyurick sarahyurick merged commit 2c20d63 into NVIDIA-NeMo:main Jun 15, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants