Strip user-set task_id from tutorials & getting-started script#2058
Conversation
PR #2036 made Task.task_id init=False (framework-owned, assigned by the executor adapter), but the tutorials/ examples and the getting-started verify script still passed task_id= to Task constructors (FileGroupTask, DocumentBatch, AudioTask, SampleTask), so they crash with: TypeError: __init__() got an unexpected keyword argument 'task_id' (reported for tutorials/math/1_cc_index_lookup.py). Remove the task_id= kwarg at every construction site; the framework assigns the id. Where a loop index existed only to build the removed task_id, drop it (for _ / for batch in ...). Read-only uses of task_id (logging, a hash seed, the audio checkpoint payload dict) are left as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Abhinav Garg <abhgarg@nvidia.com>
Greptile SummaryMechanical fix removing user-supplied
Confidence Score: 5/5Safe to merge — the changes are a purely mechanical removal of constructor arguments that the framework no longer accepts, with no logic alterations. All 13 files receive identical treatment: task_id= kwargs are dropped from Task constructors, and loop variables that existed only to build those strings are eliminated. The audio-tutorial checkpoint resume path remains stable because the hash is stored in _metadata[CKPT_HASH_KEY] and propagated through save/load, so the task.task_id fallback inside _task_hash() is never reached for reloaded tasks. The comprehension-local j in the high-quality SDG pipeline's df[id] assignment is unaffected by removing the outer enumerate variable. No behavioral change beyond task IDs now being assigned by the framework. No files require special attention — the audio checkpoint helpers (callhome_diar/run.py and single_speaker_filter/run.py) were worth verifying but are correct. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Task Constructor Call] --> B{task_id= kwarg present?}
B -->|Before PR #2058| C[TypeError: unexpected keyword argument]
B -->|After PR #2058| D[Constructor succeeds]
D --> E[Framework executor adapter assigns task_id]
E --> F{Is task_id used later?}
F -->|Read-only: logging / checkpoint payload / hash seed| G[task.task_id read — unchanged]
F -->|Loop index only used for task_id string| H[Loop variable dropped: for i → for _ ]
G --> I[Pipeline continues normally]
H --> I
Reviews (1): Last reviewed commit: "Strip user-set task_id from tutorials & ..." | Re-trigger Greptile |
|
/ok to test e83798c |
…A-NeMo#2058) PR NVIDIA-NeMo#2036 made Task.task_id init=False (framework-owned, assigned by the executor adapter), but the tutorials/ examples and the getting-started verify script still passed task_id= to Task constructors (FileGroupTask, DocumentBatch, AudioTask, SampleTask), so they crash with: TypeError: __init__() got an unexpected keyword argument 'task_id' (reported for tutorials/math/1_cc_index_lookup.py). Remove the task_id= kwarg at every construction site; the framework assigns the id. Where a loop index existed only to build the removed task_id, drop it (for _ / for batch in ...). Read-only uses of task_id (logging, a hash seed, the audio checkpoint payload dict) are left as-is. Signed-off-by: Abhinav Garg <abhgarg@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
PR #2036 made
Task.task_idinit=False(framework-owned — assigned by the executor adapter at each stage boundary). Thetutorials/examples and the getting-started verify script were not swept and still passtask_id=toTaskconstructors, so they now crash:This was reported for the math
cc_index_lookuppipeline; the same break exists across audio / text / synthetic / slurm / quickstart tutorials and the getting-started CPU verify script.Change
Remove the
task_id=kwarg at every construction site (FileGroupTask,DocumentBatch,AudioTask,SampleTask) — the framework assigns the id. Where a loop index existed only to build the removedtask_id, drop the index (for _ .../for batch in ...). Read-only uses oftask_id(logging, a hash seed, the audio checkpoint payload dict) are left unchanged.13 files, 8 insertions / 29 deletions — no behavior change beyond ids now being framework-assigned.
Labeled
docs-only(tutorials/examples) so it skips CI, per @praateekmahajan.🤖 Generated with Claude Code