Skip to content

feat: Add strict config validation to Registry.build#43

Open
luojiyin1987 wants to merge 2 commits intoOpenDCAI:mainfrom
luojiyin1987:feat/registry-config-validation
Open

feat: Add strict config validation to Registry.build#43
luojiyin1987 wants to merge 2 commits intoOpenDCAI:mainfrom
luojiyin1987:feat/registry-config-validation

Conversation

@luojiyin1987
Copy link
Copy Markdown
Contributor

Summary

Closes #42

  • Add _validate_cfg() method to detect unknown YAML config parameters
  • Add _warn_unknown_runtime() to warn about unknown runtime params
  • Use difflib.get_close_matches() for typo suggestions
  • Add skip_strict_validation class attribute for opt-out

Test plan

  • Unit tests added in tests/test_registry.py
  • Tests valid config acceptance
  • Tests unknown parameter rejection
  • Tests typo suggestion hints
  • Tests skip_strict_validation opt-out
  • Tests runtime parameter warning

Run tests: python -m pytest tests/test_registry.py -v

- Detect unknown YAML config parameters with helpful error messages
- Suggest corrections for typos using difflib.get_close_matches
- Add skip_strict_validation class attribute for opting out
- Warn about unknown runtime parameters instead of raising

Fixes silent parameter dropping which could hide configuration errors.
- Test valid config acceptance
- Test unknown parameter rejection with error messages
- Test typo suggestion hints
- Test skip_strict_validation opt-out
- Test runtime parameter warning
@haolpku
Copy link
Copy Markdown
Contributor

haolpku commented Apr 4, 2026

Thanks for putting this together — catching silent parameter drops is a real problem and I appreciate the effort to address it.

That said, I ran into a few issues when reviewing this against the actual codebase:

  1. It would break the existing nice config out of the box.

The current components.yaml passes save_interval, reward_model_backend, and reward_backend_params to the nice selector, but none of these exist in NICESelector.init. Today they are silently filtered out; with strict validation they would immediately raise a ValueError. So merging this PR as-is would crash existing configs.

  1. The runtime warnings conflict with the framework's design.

Registry.build is intentionally called with a superset of runtime dependencies (e.g. eval_dataset, data_collator) so that each component can pick what it needs. Most selectors — RandomSelector, LossSelector, DeltaLossSelector, TsdsSelector, NearSelector, CustomSelector — don't accept eval_dataset, so _warn_unknown_runtime would fire on nearly every build() call. That's by design, not a bug, and the warnings would just be noise.

  1. Tests don't cover real components or configs.

The test cases use synthetic classes defined inline, which is why the issues above weren't caught. It would be good to validate against the actual selectors/mixers/weighters and components.yaml.

I think the direction of validating user-provided YAML cfg is worth exploring, but before that we'd need to first clean up the config inconsistencies (like the unused params in the nice config), and the runtime side should probably stay as-is since silent filtering there is intentional.

Happy to discuss further if you'd like to iterate on this.

@luojiyin1987
Copy link
Copy Markdown
Contributor Author

Thanks @haolpku for the detailed review!

I agree with all your points — breaking existing configs is a blocker, and the runtime warnings conflict with the framework's design.

Before I iterate on this, I have a question:

Should I tackle the nice config cleanup as a separate PR, or is that something the team already has planned?

I'm happy to help with removing unused params from components.yaml, but I want to avoid duplicating work or stepping on planned changes.

Let me know the best approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add strict config validation to Registry.build

2 participants