Skip to content

Nemo-run Processor for TopIPL training#121

Merged
karpnv merged 40 commits into
mainfrom
sdp_ipl
Jul 7, 2025
Merged

Nemo-run Processor for TopIPL training#121
karpnv merged 40 commits into
mainfrom
sdp_ipl

Conversation

@nune-tadevosyan

@nune-tadevosyan nune-tadevosyan commented May 28, 2025

Copy link
Copy Markdown
Collaborator

Added three processors for training and inference command generation and one NeMo-Run processor specifically for IPL training.
Config file should be provided for NeMo-Run processor with the following parameters.

# The script to be run.
script: # Script path  to run relative to directory 
script_config: # Training config file for the script. ipl_epoch_stopper_callback should be provided in the config
inference_config: # Inference config file of unlabeled data for transcribe_speech_parallel

exp_name: null  # populated by exp_manager.name if not provided
results_dir: # Where to store the results of the run

nemo_directory: # Nemo directory path
do_average: # Boolean value indicating whether to do average of checkpoints for pseudo-label generation
p_cache: # Probability with which update pseudo-labeled set
num_ipl_epochs: How many epochs do pseudo-labeling

# Optional arguments
num_runs: 
num_gpus: 
num_tasks_per_node: 
max_runtime: # Specify for clusters

########################################################################################################################

executor: slurm # or local

USER: ntadevosyan

# Fields for cluster run
ssh_tunnel:
  host: 
  # ------------------------------- Fill this up! -------------------------------
  user: "${USER}"  # your username; or resolved from ${USER} environment variable ; or can be null which resolved from ${USER} environment variable
  job_dir: ""
  identity: ""
  # -----------------------------------------------------------------------------

account: 
partition:
job_name_prefix: 

containers:
  asr: # Container image


env_vars:
  - 'TOKENIZERS_PARALLELISM='
  - 'AIS_ENDPOINT="
  - 'LHOTSE_AUDIO_DURATION_MISMATCH_TOLERANCE='
  - 'TORCH_CUDNN_V8_API_ENABLED='
  - 'PYTORCH_CUDA_ALLOC_CONF='
  - 'HYDRA_FULL_ERROR=1'

required_env_vars:
  - 'HF_TOKEN='
  - 'WANDB_KEY=' 

mounts:
  # Replace with your own paths in your cluster config
  - /path/to/mount:/where/to/mount/

timeouts:
  partition_name: # Specify time

@nune-tadevosyan nune-tadevosyan requested a review from karpnv May 28, 2025 18:17
Comment thread sdp/processors/__init__.py Outdated

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add yaml config example and readme

@karpnv

karpnv commented May 29, 2025

Copy link
Copy Markdown
Collaborator

Add a requirements file with all needed dependencies, including NeMo version

Comment thread sdp/processors/IPL/conf/config.yaml Outdated
Comment thread sdp/processors/IPL/conf/nemo_run_config.yaml Outdated
Comment thread requirements/ipl.txt
@nune-tadevosyan nune-tadevosyan force-pushed the sdp_ipl branch 4 times, most recently from 7960435 to 1d1a351 Compare June 7, 2025 11:24
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
@karpnv karpnv requested a review from lilithgrigoryan June 9, 2025 08:52

@lilithgrigoryan lilithgrigoryan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. Thank you.
Please add copyright files where needed and pull changes from main.

And please fix DCO test

Comment thread dataset_configs/ipl/nemo_run_config.yaml Outdated
Comment thread sdp/processors/ipl/__init__.py
Comment thread sdp/processors/__init__.py
Comment thread sdp/processors/ipl/ipl_processors.py
Comment thread sdp/processors/ipl/nemo_run_processor.py Outdated
Comment thread sdp/utils/nemo_run_utils.py Outdated
Signed-off-by: Nune <ntadevosyan@nvidia.com>
@nune-tadevosyan nune-tadevosyan force-pushed the sdp_ipl branch 2 times, most recently from 271f1b0 to f36dd60 Compare June 16, 2025 12:32
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Comment thread dataset_configs/ipl/config.yaml Outdated
@nune-tadevosyan nune-tadevosyan requested a review from karpnv June 17, 2025 12:52
@Jorjeous

Copy link
Copy Markdown
Collaborator

Good morning Nune!
Please do
'''git checkout origin/main -- docker/Dockerfile'''
For Docker tests to fix

@karpnv

karpnv commented Jun 18, 2025

Copy link
Copy Markdown
Collaborator

git pull origin mian should fix tests

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull the latest vertion in main to pass the tests

nune-tadevosyan and others added 4 commits June 24, 2025 15:10
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

@Jorjeous Jorjeous left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.rst
Please fix warnings in doctests related to file mentioned above
(line interval, etc)

Jorjeous and others added 6 commits June 26, 2025 15:35
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
@Jorjeous

Jorjeous commented Jul 2, 2025

Copy link
Copy Markdown
Collaborator

I see that now we have some ubuntu related problems

nemo_directory (str): Base directory for NeMo framework
new_manifest_files (str, Optional): New manifest files to add to the training configuration
new_tarred_audio_filepaths (str, Optional): New tarred audio filepaths to add to the training configuration
**kwargs: Additional arguments passed to the parent BaseProcessor class

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add example section

num_gpus (int): Number of GPUs to use
is_tarred (bool): Whether the audio is tarred
first_run (bool): Whether this is the first run of pseudo-labeling
**kwargs: Additional arguments passed to the parent BaseProcessor class

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_path (str): Path to the YAML configuration file containing IPL settings
output_manifest_file (str): Path where the output manifest file will be written
input_manifest_file (str, Optional): Path to the input manifest file
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add example section

@karpnv karpnv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karpnv karpnv merged commit 89e596d into main Jul 7, 2025
9 of 10 checks passed
@Jorjeous

Jorjeous commented Jul 7, 2025

Copy link
Copy Markdown
Collaborator

We forgot example section...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants