Skip to content

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625

Open
nabinchha wants to merge 2 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider
Open

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625
nabinchha wants to merge 2 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented May 9, 2026

Summary

Final cleanup after the #589 deprecation cycle. Providers are now a named list referenced explicitly by each ModelConfig: no hidden registry default, no YAML default: flowing into DataDesigner.__init__, and no ambiguity about which provider wins.

The original release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes #594, so users have had a release containing the deprecation warnings before this removal lands.

Related Issue

Closes #590. Depends on #588 (merged as #591) and #589 (merged as #594, released in v0.6.0).

Changes

Removed

  • Removed ModelProviderRegistry.default, default-provider validators, and get_default_provider_name() from the engine registry.
  • Removed the config-layer get_default_provider_name() helper.
  • Removed the CLI default-provider field, set_default / get_default service methods, default-tracking in update/delete, the "Change default provider" menu option, and the "Default" column from data-designer config list.
  • Removed the top-level default_provider and per-item configured_provider / effective_provider fields from agent introspection output in favor of a single explicit per-item provider field.

Changed

  • Made ModelConfig.provider required.
  • Changed ModelProviderRegistry.get_provider(name) to require a concrete provider name.
  • Changed resolve_model_provider_registry(providers) to drop default_provider_name.
  • Simplified DataDesigner.__init__ so it resolves providers without consulting YAML defaults.
  • Split the CLI model-form error for "no providers configured" versus "multiple providers configured but none selected."
  • Updated legacy docs and Fern docs to remove the deprecated default-provider workflow, mark provider as required, and call out the upgrade impact for stale model_configs.yaml entries and agent context consumers.

Compatibility

  • Existing model_providers.yaml files with a legacy default: key still load cleanly because both ModelProviderRegistry classes inherit directly from pydantic.BaseModel, preserving pydantic v2's default extra="ignore" behavior. This invariant is now documented on the classes and covered by tests.
  • Existing model_configs.yaml entries that omit provider or set it to null must be updated with an explicit provider name.

Attention Areas

Reviewers should pay special attention to:

Testing

  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer/tests/cli/forms/test_model_builder.py packages/data-designer-engine/tests/engine/test_model_provider.py packages/data-designer/tests/cli/repositories/test_provider_repository.py packages/data-designer/tests/interface/test_data_designer.py packages/data-designer-config/tests/config/test_models.py packages/data-designer-config/tests/config/test_default_model_settings.py -q (162 passed before the final docs-wording amend)
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff check ...
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff format --check ...
  • Full GitHub Actions on rebased branch: restarted after the final docs-wording amend
  • fern check: not run locally because the fern CLI is not installed in this environment

Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Unit tests added/updated
  • Docs updated

@nabinchha nabinchha requested a review from a team as a code owner May 9, 2026 00:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

MkDocs preview: https://c5716626.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-625.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Review: PR #625refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

Summary

Final cleanup of the #589 deprecation cycle: drops the implicit "default provider" concept end-to-end. ModelConfig.provider goes from str | None = None to required str; ModelProviderRegistry.default (and both its validators + get_default_provider_name()) is removed; the CLI's set_default/get_default/"Change default provider" workflow is gone; agent introspection drops default_provider/effective_provider. Net −722 lines across 31 files, test suite updated in lockstep (deprecation-warning tests → strict-validation tests). Migration for on-disk YAMLs carrying a legacy default: relies on pydantic v2's extra="ignore" default, locked in by two new tests.

The PR is well-scoped, well-tested, and architecturally clean. The dependency direction (interface → engine → config) is preserved and the two ModelProviderRegistry classes (engine + CLI repo) are updated consistently. CI is green on the Python 3.10–3.12 matrix; 3.13 runs are still in progress at review time.

The primary risk is a hard public-API break, gated on the #594 deprecation cycle having shipped in a release — which the PR author has correctly flagged as a blocker and the release history confirms has not yet happened (last release v0.5.9, 2026-04-28; #594 merged 2026-05-05).

Findings

⛔ Blocker (already called out by author)

⚠️ Breaking changes worth an explicit user-facing note

  • ModelConfig.provider becomes required. Anyone constructing ModelConfig(alias=..., model=...) without provider= now gets ValidationError. Likewise any on-disk ~/.data-designer/model_configs.yaml entry that omitted provider: or serialised it as null will fail to load after upgrade. The PR body notes "Migration is one-line per call site" for code, but on-disk configs from pre-feat(models): deprecate implicit default provider routing #594 users have no migration helper — they will hit ValidationError on load. Consider either (a) adding a CHANGELOG/release-notes entry calling this out, or (b) catching this specific ValidationError in ProviderRepository.load / ModelRepository.load with a targeted error message pointing at the fix. I lean toward (a) since users on stale model configs are a shrinking set and a clear CHANGELOG entry is cheaper than a bespoke migration path.

  • Agent introspection shape change. get_model_aliases_state() drops default_provider from the top level and collapses configured_provider + effective_provider into a single provider field (see packages/data-designer/src/data_designer/cli/utils/agent_introspection.py). This is observable output that agents consuming data-designer agent context parse. The PR description lists this under "Removed" but doesn't flag it as a schema change. If any downstream agent scaffolding keys off those three field names, it will silently produce missing-field errors or empty strings. Worth an explicit callout in release notes alongside the ModelConfig change.

✅ Migration path for legacy YAML default: — correct and well-tested

  • Both ModelProviderRegistry classes (engine model_provider.py and CLI provider_repository.py) inherit from BaseModel directly, not ConfigBase (which overrides extra="forbid"). The PR correctly locks this in:
    • test_registry_ignores_legacy_default_key (engine)
    • test_load_silently_ignores_legacy_default_key (CLI repo)
  • Subtle-but-important invariant: if someone later rebases these classes onto ConfigBase, the migration silently breaks (old YAMLs would start raising ValidationError on the stale default: key). A short comment on the class itself — "inherits BaseModel directly so extra='ignore' lets us drop legacy default: keys" — would protect against this. The docstring on ProviderRepository.load already says this; the class itself doesn't.

🟡 Minor / nits

  • provider_controller.py menu order change. The pre-refactor code appended "exit" after the conditional "change_default" so it always rendered last. The new code inlines "exit" in the initial dict literal, which keeps it last only because there is now nothing else to append. Behaviour is identical today; flagging only because a future "add menu option" patch could accidentally push exit out of its conventional last-position slot. Not worth changing.

  • Unused import cleanup is thoroughwarn_at_caller removed from models.py, default_model_settings.py, model_provider.py, provider_controller.py, provider_repository.py. Good. No orphaned warnings import.

  • scripts/benchmarks/benchmark_engine_v2.py correctly updated to drop default_provider_name=. Nice catch — benchmark scripts are easy to miss.

  • Docs parity: Four docs/concepts/models/*.md files had their deprecation warnings stripped. Consistent. Didn't spot any orphaned #589 references in the docs tree, but a quick grep -r "589" docs/ before merging wouldn't hurt.

  • CLI README.md drops the "default: nvidia" line from the example YAML. Good — keeps user-visible examples aligned with the new schema.

Test coverage observations

  • Previously-added warning-emission tests (test_model_config_provider_none_emits_deprecation_warning, test_explicit_default_emits_deprecation_warning, test_handle_change_default_emits_deprecation_warning, etc.) have been correctly replaced by their post-cycle counterparts (ValidationError-based). No test appears to have been silently dropped.
  • test_init_no_user_providers_loads_from_yaml is the right shape for the remaining DataDesigner.__init__ YAML-fallback path, replacing the older three-test cluster (test_init_user_supplied_providers_ignore_unrelated_yaml_default, test_init_user_supplied_providers_preserve_first_wins_over_yaml_default, test_init_yaml_default_emits_single_deprecation_warning). The first two were bug: default model provider leaks from YAML into DataDesigner constructor #588 regressions — those bugs can't reoccur once the default field is gone, so removal is correct.
  • No new e2e tests, but the PR description's "N/A (no runtime behavior change beyond stricter validation)" is accurate: this is a pure API-surface tightening.

Security / performance

  • No security implications. No credentials handling changed.
  • No performance implications. One fewer model_validator on ModelConfig and three fewer on ModelProviderRegistry — negligible but nominally a micro-win.

Verdict

Approve pending release gate. The code change is clean, the migration strategy for existing YAMLs is sound and tested, and the CI signal is strong. The two items worth addressing before merge are non-blocking:

  1. (Recommended) Add a CHANGELOG / release-notes entry explicitly calling out (a) ModelConfig.provider becoming required, and (b) the agent-introspection schema change. Users upgrading from v0.5.9 → next release with stale on-disk configs or downstream agent tooling will otherwise hit an undocumented break.
  2. (Nit) Drop a one-line comment on both ModelProviderRegistry classes pinning the BaseModel-not-ConfigBase invariant that makes the legacy-default: migration work.

Both are worth doing but neither should block merge once the release gate lifts.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR completes the deprecation cycle started in #589/#594 by permanently removing ModelProviderRegistry.default, the get_default_provider_name() helpers, and all related YAML-default routing logic. ModelConfig.provider is now a required field.

  • Engine/config: ModelProviderRegistry drops the default field and the three associated model-validators; ModelConfig.provider changes from str | None = None to the required str; resolve_model_provider_registry drops the default_provider_name parameter; pydantic v2's extra=\"ignore\" silently discards legacy default: keys from existing YAMLs.
  • CLI: ProviderService.set_default/get_default removed; ProviderController's "Change default provider" menu option removed; agent_introspection output collapses configured_provider/effective_provider to a single provider field; data-designer config list drops the "Default" column from the providers table.
  • Tests: 162 tests updated throughout to supply explicit provider= and to assert the new error/exception contracts.

Confidence Score: 5/5

Safe to merge — all changes are consistent removals of deprecated default-provider machinery with no new logic introduced.

Every changed file follows the same pattern: remove the deprecated default field, remove the validators and helpers that depended on it, and replace str | None with str for ModelConfig.provider. Backward compatibility for old YAMLs is preserved via pydantic v2's extra="ignore", which is pinned by a dedicated regression test. The 162-test suite covers all previously-warned paths and the new error contracts. No concurrent-access concerns, no new external calls, and no previously-flagged issue left unaddressed.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/model_provider.py Removes default field, three model-validators, and get_default_provider_name(). Adds validate_providers_not_empty field-validator. get_provider now requires a concrete str. resolve_model_provider_registry drops default_provider_name param. Clean and correct.
packages/data-designer-config/src/data_designer/config/models.py Makes ModelConfig.provider required (str, no default). Removes the _warn_on_implicit_provider model-validator. Breaking API change, intentional and documented.
packages/data-designer/src/data_designer/interface/data_designer.py Removes YAML default: consultation, get_default_provider_name() call, and the warnings.catch_warnings suppression block. Construction simplified to a single resolve_model_provider_registry call.
packages/data-designer/src/data_designer/cli/forms/model_builder.py Splits the previously conflated 'no providers' error into two precise branches: one for zero providers and one for multiple-providers-but-none-selected. Fixes the misleading error message flagged in prior review.
packages/data-designer/src/data_designer/cli/utils/agent_introspection.py Drops default_provider from state dict and collapses configured_provider/effective_provider to a single provider field per item. Direct mc.provider lookup replaces the effective = mc.provider or default_provider fallback.
packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py Removes default_provider: header line from model-aliases text output and the column_labels parameter from _format_table. Now renders the provider column directly without remapping.
packages/data-designer/src/data_designer/cli/services/provider_service.py Removes set_default/get_default methods and the registry.default tracking in update and delete. Straightforward deletions with no correctness concerns.
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py Removes default field from CLI-layer ModelProviderRegistry. Deprecation warning for legacy default: YAML key removed; pydantic extra="ignore" silently drops it. Adds doc comment explaining the invariant.
packages/data-designer/src/data_designer/cli/controllers/provider_controller.py Removes _handle_change_default and the conditional 'Change default provider' menu insertion. 'exit' option is now unconditionally present in the options dict.
packages/data-designer/tests/interface/test_data_designer.py Removes ~115 lines of deprecation-warning and first-wins behavioral tests that no longer apply. Retains a single test pinning the YAML-fallback path, which now just checks provider names rather than get_default_provider_name().
packages/data-designer-engine/tests/engine/test_model_provider.py Removes ~80 lines of deprecation-warning tests. Adds test_registry_ignores_legacy_default_key to pin pydantic's extra="ignore" behavior for old YAML payloads. All remaining assertions are correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DataDesigner.__init__] --> B{model_providers\nsupplied?}
    B -- yes --> C[_resolve_model_providers\nuser list]
    B -- no --> D[_resolve_model_providers\nload from YAML]
    C --> E[resolve_model_provider_registry\nproviders only]
    D --> E
    E --> F[ModelProviderRegistry\nvalidate_providers_not_empty\nvalidate_unique_names]
    F --> G[_model_provider_registry]
    H[ModelConfig.provider\nrequired str] --> I[registry.get_provider\nname: str]
    G --> I
    I -- found --> J[ModelProvider]
    I -- not found --> K[UnknownProviderError]
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into nmulepati/fix-5..." | Re-trigger Greptile

Comment on lines 294 to +299
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in form_data. When len(self.available_providers) > 1 and no "provider" key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

Suggested change
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
else:
if self.available_providers:
raise ValueError(
"Cannot build ModelConfig: multiple providers are configured but none "
"was specified in form data. Specify provider= explicitly on each ModelConfig."
)
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/forms/model_builder.py
Line: 294-299

Comment:
The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in `form_data`. When `len(self.available_providers) > 1` and no `"provider"` key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

```suggestion
        else:
            if self.available_providers:
                raise ValueError(
                    "Cannot build ModelConfig: multiple providers are configured but none "
                    "was specified in form data. Specify provider= explicitly on each ModelConfig."
                )
            raise ValueError(
                "Cannot build ModelConfig: no provider specified in form data and no "
                "available providers configured. Add a provider first via "
                "`data-designer config providers`."
            )
```

How can I resolve this? If you propose a fix, please make it concise.

model: str
inference_parameters: InferenceParamsT = Field(default_factory=ChatCompletionInferenceParams)
provider: str | None = None
provider: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and Codex both caught a migration edge here: once provider is required, an existing model_configs.yaml with one provider-less entry makes ModelRepository.load() return None, so CLI and agent flows can treat the whole model file as missing. Could we add a migration/error path in ModelRepository.load() so users see "add provider to alias X" instead of silently losing the registry?

@andreatgretel
Copy link
Copy Markdown
Contributor

docs/concepts/models/model-configs.md:18

provider | str | No | Reference to the name of the Provider to use ... If not specified, one set as the default provider ...

model-configs.md still says provider is not required and falls back to a default. Since this PR removes that path, this doc should say Required: Yes and drop the default-provider fallback text.

@andreatgretel
Copy link
Copy Markdown
Contributor

Looks like the release gate in the PR body is still active: latest GitHub release is v0.5.9, published before #594 merged, so users have not yet had a released version with the deprecation warning. Assuming that gate still applies, this should wait for the next release before merging.

Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the migration issue and stale docs noted above. The release gate in the PR body also still looks active.

@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch 2 times, most recently from f97431c to eab4026 Compare May 20, 2026 15:52
…g.provider (#590)

Final cleanup after the #589 deprecation cycle. Providers are now a
named list referenced by name end-to-end — no hidden registry default,
no YAML default leaking into DataDesigner construction (#588), and no
ambiguity about which provider wins.

ModelConfig.provider becomes a required str. ModelProviderRegistry.default
and the associated validators, get_default_provider_name() helper, CLI
"Change default provider" workflow, and the "Default" column in
`data-designer config list` are removed. resolve_model_provider_registry()
drops its default_provider_name kwarg, and DataDesigner.__init__ no
longer threads a YAML default through registry construction.

Pydantic v2's extra="ignore" default lets existing on-disk YAMLs with
default: keys load cleanly, so users on the deprecation cycle migrate
without breakage.
@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch from eab4026 to 09eeae7 Compare May 20, 2026 15:54
@nabinchha
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. I pushed an updated branch and believe the comments are now addressed:

  • docs/concepts/models/model-configs.md now marks provider as required and removes the default-provider fallback text. I also made the same update in the Fern docs page.
  • The release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes feat(models): deprecate implicit default provider routing #594, so users have had a release containing the deprecation warnings before this removal lands. I updated the PR body accordingly.
  • The breaking-change notes are now explicit in both the PR body and docs: stale model_configs.yaml entries that omit provider or set it to null need an explicit provider name.
  • The agent-context schema change is called out more precisely: consumers should read each model alias item's provider; the top-level default_provider and per-item configured_provider / effective_provider fields are no longer emitted.
  • I added class docstrings on both ModelProviderRegistry classes documenting the BaseModel / extra="ignore" invariant for legacy default: YAML keys.
  • I addressed Greptile's CLI diagnostic note by splitting the ModelFormBuilder.build_config error messages for "no providers configured" versus "multiple providers configured but none selected", with a focused test.

The branch is rebased onto current main. GitHub Actions, docs preview, DCO, and Greptile are all green on the latest commit. Could you re-review when you have a chance?

@nabinchha nabinchha requested a review from andreatgretel May 20, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

2 participants