Skip to content

Conversation

@polinabinder1
Copy link
Collaborator

@polinabinder1 polinabinder1 commented Jan 23, 2026

Remote profiling for SCDL with a remote dataloading.
New test file for chunked dataset functionality (loading, iteration, chunked header validation). This compares iterating over a dataset locally and remotely. It also prints out metrics including wait time, cache hits, and network throughput relevant to the remote dataloader.
usage:
python sub-packages/bionemo-scspeedtest/examples/chunked_scdl_benchmark.py
--scdl-path /path/to/regular_scdl
--chunked-path /path/to/chunked_scdl
--remote-path s3://bucket/chunked_scdl
--batch-size 64
--max-batches 1000

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant