feat: Add strict config validation to Registry.build#43
feat: Add strict config validation to Registry.build#43luojiyin1987 wants to merge 2 commits intoOpenDCAI:mainfrom
Conversation
- Detect unknown YAML config parameters with helpful error messages - Suggest corrections for typos using difflib.get_close_matches - Add skip_strict_validation class attribute for opting out - Warn about unknown runtime parameters instead of raising Fixes silent parameter dropping which could hide configuration errors.
- Test valid config acceptance - Test unknown parameter rejection with error messages - Test typo suggestion hints - Test skip_strict_validation opt-out - Test runtime parameter warning
|
Thanks for putting this together — catching silent parameter drops is a real problem and I appreciate the effort to address it. That said, I ran into a few issues when reviewing this against the actual codebase:
The current components.yaml passes save_interval, reward_model_backend, and reward_backend_params to the nice selector, but none of these exist in NICESelector.init. Today they are silently filtered out; with strict validation they would immediately raise a ValueError. So merging this PR as-is would crash existing configs.
Registry.build is intentionally called with a superset of runtime dependencies (e.g. eval_dataset, data_collator) so that each component can pick what it needs. Most selectors — RandomSelector, LossSelector, DeltaLossSelector, TsdsSelector, NearSelector, CustomSelector — don't accept eval_dataset, so _warn_unknown_runtime would fire on nearly every build() call. That's by design, not a bug, and the warnings would just be noise.
The test cases use synthetic classes defined inline, which is why the issues above weren't caught. It would be good to validate against the actual selectors/mixers/weighters and components.yaml. I think the direction of validating user-provided YAML cfg is worth exploring, but before that we'd need to first clean up the config inconsistencies (like the unused params in the nice config), and the runtime side should probably stay as-is since silent filtering there is intentional. Happy to discuss further if you'd like to iterate on this. |
|
Thanks @haolpku for the detailed review! I agree with all your points — breaking existing configs is a blocker, and the runtime warnings conflict with the framework's design. Before I iterate on this, I have a question: Should I tackle the nice config cleanup as a separate PR, or is that something the team already has planned? I'm happy to help with removing unused params from Let me know the best approach! |
Summary
Closes #42
_validate_cfg()method to detect unknown YAML config parameters_warn_unknown_runtime()to warn about unknown runtime paramsdifflib.get_close_matches()for typo suggestionsskip_strict_validationclass attribute for opt-outTest plan
tests/test_registry.pyRun tests:
python -m pytest tests/test_registry.py -v