fix: auto-detect Ray fanout stages#2056
Conversation
Signed-off-by: nightcityblade <nightcityblade@gmail.com>
Signed-off-by: nightcityblade <nightcityblade@gmail.com>
Signed-off-by: nightcityblade <nightcityblade@gmail.com>
Signed-off-by: nightcityblade <nightcityblade@gmail.com>
Signed-off-by: nightcityblade <nightcityblade@gmail.com>
Greptile SummaryThis PR eliminates per-stage
Confidence Score: 4/5Safe to merge — the auto-detection logic is well-tested and all removed overrides were verified to have matching list[...] return annotations. The implementation is sound for all current stages. The one gap is that get_type_hints can raise AttributeError on dotted-attribute annotations, which would surface as an unexpected exception from ray_stage_spec() rather than silently falling back — a narrow but real failure mode on the changed path. nemo_curator/stages/base.py — the _process_returns_list except clause Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["ray_stage_spec() called"] --> B["_process_returns_list(cls)"]
B --> C{"get_type_hints(cls.process) succeeds?"}
C -- "Yes" --> D["return_annotation = resolved type"]
C -- "NameError / TypeError / AttributeError" --> E["return_annotation = raw __annotations__ string"]
D --> F["_annotation_includes_list(annotation)"]
E --> F
F --> G{"isinstance(annotation, str)?"}
G -- "Yes" --> H{"startswith list[ / List[ / typing.List[?"}
G -- "No" --> I{"get_origin(annotation) is list?"}
H -- "True" --> J["return is_fanout_stage: True"]
H -- "False" --> K["return {}"]
I -- "True" --> J
I -- "False" --> K
Reviews (1): Last reviewed commit: "fix: return clip fanout tasks consistent..." | Re-trigger Greptile |
| try: | ||
| return_annotation = get_type_hints(cls.process).get("return") | ||
| except (NameError, TypeError): | ||
| return_annotation = cls.process.__annotations__.get("return") |
There was a problem hiding this comment.
Adding
AttributeError to the except tuple ensures the fallback path is reached when get_type_hints encounters a dotted-attribute annotation referencing a missing symbol (e.g. some_module.MissingType). Without it, the exception propagates out of ray_stage_spec().
| try: | |
| return_annotation = get_type_hints(cls.process).get("return") | |
| except (NameError, TypeError): | |
| return_annotation = cls.process.__annotations__.get("return") | |
| try: | |
| return_annotation = get_type_hints(cls.process).get("return") | |
| except (NameError, TypeError, AttributeError): | |
| return_annotation = cls.process.__annotations__.get("return") |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
Closes #1613.
Automatically marks Ray Data stages as fanout stages when their
processreturn annotation is a concretelist[...]. This lets stages likeURLGenerationStagerely on the baseProcessingStagedefault instead of manually settingis_fanout_stage.Supersedes #2025 with the same branch after refreshing the DCO sign-offs.
Usage
Checklist
Tests:
python3 -m compileall nemo_curator/stages/base.py nemo_curator/stages/video/clipping/clip_extraction_stages.py tests/stages/video/clipping/test_clip_transcoding_stage.py