skills(model-serving): merge dev-side training/agent flows from a-d-k experimental#84
skills(model-serving): merge dev-side training/agent flows from a-d-k experimental#84jamesbroadhead wants to merge 13 commits into
Conversation
Phase 1 of #73's TODO #1b. Adds references/fm-api-endpoints.md with the curated Foundation Model API endpoint table (chat/instruct + embedding models) from databricks-solutions/ai-dev-kit's model-serving skill, plus common defaults and query examples (CLI + SDK). Stripped: the cloud/language prefix on the docs link, and the leftover MCP-tool references in the source. The endpoint table itself is static catalog data — no MCP coupling. SKILL.md updates: - bump version to 0.2.0 - point Endpoint Types table at the new reference - point the Foundation Model discovery bullet at the new reference Subsequent phases (separate PRs / commits) port the remaining dev-side content: classical-ml autolog patterns, Custom PyFunc signatures, ResponsesAgent with the create_text_output_item gotcha, UCFunctionToolkit + VectorSearchRetrieverTool resource passthrough. Co-authored-by: Isaac
Aligns the verbatim a-d-k port with the live docs.databricks.com
supported-models page (validated via WebFetch on 2026-05-26):
ADDED (missing from a-d-k snapshot):
- databricks-claude-opus-4-7 (now most capable Claude)
- databricks-gpt-5-5-pro, 5-5
- databricks-gpt-5-4, 5-4-mini, 5-4-nano
- databricks-gpt-5-3-codex, 5-2-codex
- databricks-gemini-3-1-flash-lite, 3-5-flash
- databricks-qwen35-122b-a10b (Preview)
REMOVED (retired, no longer in docs):
- databricks-claude-3-7-sonnet
- databricks-meta-llama-3-1-405b-instruct
UPDATED notes:
- claude-opus-4-6 no longer "Most capable"
- gpt-5-2 no longer "Latest"
- gpt-5-1-codex-{max,mini} + gpt-5-2-codex marked retiring 2026-07-16
- gemini-3-pro marked retired 2026-03-26 with redirect through 2026-06-07
- Several Gemini / Codex endpoints annotated with cross-geo requirement
- qwen3-next-80b annotated as Preview
OPENING PARAGRAPH:
- "available in every workspace" -> "available in supported Model Serving
regions"; calls out cross-geo requirement for several endpoints
NOT TOUCHED (out of scope: not docs-validatable from supported-models page):
- served_entities[].entity_name guidance (line 3 second half)
- SKILL.md "system.ai.* catalog" claim on the pay-per-token row
These remain as in the a-d-k snapshot and should be revisited if/when
docs cover them directly.
Test plan: `scripts/skills.py validate` -> "Everything is up to date";
`scripts/skills.py generate` -> only refreshes manifest.json timestamps.
Co-authored-by: Isaac
c9015d8 to
d400eff
Compare
|
@jamesbroadhead I suspect this is also coming from main from the content I see? The experimental skill is https://github.com/databricks-solutions/ai-dev-kit/blob/experimental/databricks-skills/databricks-ml-training-serving/SKILL.md |
|
Hi @QuentinAmbard — Claude here, working with James. You're right, and I owe you (and the PR description) a correction. I checked both branches:
Content-wise it's a clean fingerprint match for The bigger issue is philosophical: the experimental
It ships I'll rework this PR to actually align with the experimental skill — replace the static catalog reference file with the runtime-list snippet + the runtime-resolved defaults, and fix the PR description. The doc-validated catalog work isn't wasted; it just shouldn't be how the stable skill steers callers. On "Should I also open another merge PR for suggestion?" — happy to coordinate. If you mean a PR upstream into a-d-k's |
…ot static catalog Quentin pointed out (PR #84) that the prior two commits actually ported from `main:databricks-skills/databricks-model-serving/`, not `experimental:databricks-skills/databricks-ml-training-serving/` as the PR description claimed. The two skills take opposite approaches: - `main` ships a static catalog table of FM API endpoint names. - `experimental` deliberately rejects that ("a static skill list goes stale fast — always list at runtime instead of hard-coding names") and ships a `databricks serving-endpoints list | jq ...` one-liner plus runtime-resolved defaults (highest-numbered Claude Sonnet for agents, highest-numbered `-codex-max` for code). Re-port to match the experimental philosophy: - `references/fm-api-endpoints.md`: replace the static catalog with the runtime-list snippet (filtered by `databricks-` name prefix AND `system.ai.*` served entity, to exclude non-FM endpoints sharing the prefix), runtime-resolved family defaults, and CLI + SDK query examples that use a placeholder endpoint name rather than a hard-coded model. - `SKILL.md`: update the Endpoint Types row + the Foundation-Model discovery bullet to reframe the reference as "discover at runtime" rather than "curated table". Version stays at 0.2.0 (frontmatter unchanged → manifest unchanged). The 2026-05-26 catalog refresh in the previous commit is dropped here: the experimental skill's point is that no static table is the right shape, so curating one against docs.databricks.com isn't useful for the stable skill either. Co-authored-by: Isaac
…ental port Previous commit (c148500) restated the experimental section in my own words and added a "Querying" section + provisioned-throughput aside + docs-link gloss that aren't in the upstream skill. The PR's stated goal is to port from experimental — do an actual port, not a paraphrase. `references/fm-api-endpoints.md` now mirrors the `## Foundation Model API endpoints` section of `experimental:databricks-ml-training-serving/SKILL.md` verbatim (heading promoted from `##` to `#` since this is a standalone file): intro paragraph + the `databricks serving-endpoints list | jq ...` one-liner + the family-based default-picking rule. Nothing else. Also trim the SKILL.md discovery bullet back toward its original shape — link to the reference file for the runtime-list snippet, then the same `system.ai` / `serving-endpoints list` / `get-open-api` alternatives that were already there. Co-authored-by: Isaac
…ntal
Expands the port from the FM-endpoints-only scope to cover every
section of `experimental:databricks-ml-training-serving/`. Mirrors
the experimental skill's 3-file structure 1:1 into stable's
`references/` directory; the standalone fm-api-endpoints.md added in
earlier commits goes away (its content lives inline in
training-and-serving.md exactly as it does in experimental's SKILL.md).
Added (all verbatim ports, mechanical adjustments only):
references/training-and-serving.md
Ports experimental SKILL.md content. Mechanical changes only:
frontmatter stripped (destination is a reference file, not a
SKILL.md); `1-custom-pyfunc.md` → `custom-pyfunc.md`,
`2-genai-agents.md` → `genai-agents.md` (filename renames);
`../<skill>/SKILL.md` → `../../<skill>/SKILL.md` (one more level
of nesting since this file is in references/ rather than at the
skill root). Content covers: canonical train/register/serve flow,
`mlflow.{sklearn,xgboost,…}.autolog()` patterns, UC alias-based
promotion, batch scoring via `spark_udf`, real-time endpoint
create + zero-downtime version swap, `state.ready` vs
`state.config_update` poll-both gotcha, `jobs submit --no-wait`
serverless deploy pattern, Foundation Model API endpoints
runtime-list, and the full gotchas trap-table.
references/custom-pyfunc.md
Ports experimental 1-custom-pyfunc.md verbatim.
Mechanical change: `[SKILL.md]` → `[training-and-serving.md]`
where the original cross-referenced its parent SKILL.md.
Content: file-based PyFunc ("Models from Code"),
`infer_signature`, `code_paths`, pre-deploy validation via
`mlflow.models.predict(env_manager="uv")`.
references/genai-agents.md
Ports experimental 2-genai-agents.md verbatim.
Mechanical changes: cross-skill paths bumped one level deeper;
`[SKILL.md]` → `[training-and-serving.md]`. Content covers:
`ResponsesAgent` interface, LangGraph agent with
`UCFunctionToolkit` + `VectorSearchRetrieverTool`, the
`create_text_output_item` raw-dict-silently-fails gotcha, the
`resources=[...]` passthrough-auth list (DatabricksServingEndpoint,
DatabricksFunction, DatabricksVectorSearchIndex, DatabricksLakebase),
async deploy via `agents.deploy()` from a serverless job, query
via CLI and OpenAI-compatible client.
Removed:
references/fm-api-endpoints.md
Standalone file from earlier commits; its content lives inline
in training-and-serving.md exactly as it does in experimental's
SKILL.md, so the deliberate split is no longer needed.
Stable SKILL.md updates (minimal, ops-focus preserved):
- FM-endpoint link targets updated from `references/fm-api-endpoints.md`
to `references/training-and-serving.md#foundation-model-api-endpoints`
in the Endpoint Types table row and the FM-discovery bullet.
- New `### Develop & deploy new models` subsection under "What's Next"
with a 3-row table pointing at the new dev-side references, framed
as "this skill is ops-focused; for the dev-side flow, see below".
Manifest regenerated.
Co-authored-by: Isaac
- The mechanical `../` → `../../` rewrite in the verbatim port assumed every peer skill is stable, but 4 of them live in `experimental/`. `../../<skill>/SKILL.md` resolved to `skills/<skill>/SKILL.md` which does not exist for `databricks-agent-bricks`, `databricks-mlflow-evaluation`, `databricks-vector-search`, `databricks-unity-catalog`. Repointed to `../../../experimental/<skill>/SKILL.md`. `databricks-jobs` link unchanged (it's stable). - SKILL.md frontmatter `description` only described the ops surface, so agents wouldn't route dev-side asks (train, register, PyFunc, ResponsesAgent) to this skill. Broadened to cover both ops and the new dev surface. - Version bumped 0.2.0 → 0.3.0 + manifest regenerated. Co-authored-by: Isaac
QuentinAmbard
left a comment
There was a problem hiding this comment.
nice let's merge this one, I'll send a followup PR on top!
|
|
||
| ## Deploy (async job, ~15 min) | ||
|
|
||
| `databricks.agents.deploy()` blocks for ~15 minutes — don't run it inline from the CLI. Submit as a serverless job so the chat session doesn't hold the connection. |
There was a problem hiding this comment.
This is great - should we add something about how agents can check if there has already been submitted a serverless job for the deploy?
There was a problem hiding this comment.
(Claude here.)
Good call — added in 8c8a1b3. Two cheap checks just before the submit:
databricks jobs list-runs --active-onlyfiltered onrun_name == "deploy_<model>"to catch an already-in-flight deploy.databricks serving-endpoints get <endpoint_name>to skip the redeploy if the endpoint already exists on the right version.
If either hits, the recipe now says to follow the existing run with jobs get-run instead of submitting a new one.
…-phase1 # Conflicts: # manifest.json
Per @simonfaltum review: before resubmitting a deploy serverless job, agents should check whether a run is already in flight (active job runs filtered on run_name) or whether the target endpoint already exists in the right state. Avoids wasting ~15 min of serverless and racing for the same endpoint name. Co-authored-by: Isaac
…apx Related Skills entry `databricks-app-apx` was the FastAPI+React stack referenced from ai-dev-kit's `databricks-apps-python` skill. It has been removed upstream (a-d-k is deprecated; the apx-on-CLI flow merged into the stable `databricks-apps` skill via #84/#73). The "Related Skills" bullet is the last dangling reference inside this repo. This PR was prepared by Claude.
|
Stacked a follow-up on this in #110 — adds a separate #110 includes the commits from this PR at its base — please merge this one first, then #110 will rebase cleanly onto the new main. |
- Drop ../../../experimental/... cross-skill links that 404 when installed
(skills install flat under ~/.claude/skills/, not under stable/ vs
experimental/). Use plain skill-name references instead.
- Replace ai-dev-kit-specific tag examples ("aidevkit_project") with a
neutral "project": "demo" so a d-a-s skill doesn't bleed a-d-k convention.
- Tighten SKILL.md description from ~870 chars to ~290 chars, matching the
convention being established in PR #107.
Co-authored-by: Isaac
…apx Related Skills entry (#106) ## Summary Removes the last dangling `databricks-app-apx` reference in this repo — one line in `experimental/databricks-apps-python/SKILL.md` ("Related Skills" bullet). ## Why `databricks-app-apx` was the FastAPI+React stack referenced from ai-dev-kit's `databricks-apps-python`. It has been removed upstream (a-d-k is deprecated; the apx-on-CLI flow merged into the stable `databricks-apps` skill via #84/#73). I grepped the entire repo and this bullet is the only remaining mention — README, install scripts, and stable skills no longer reference it. ## Test plan - [x] `python3 scripts/skills.py validate` passes (`Everything is up to date.`) - [x] `grep -rn databricks-app-apx .` returns no remaining hits. - [ ] CI green. This pull request and its description were written by Claude.
|
@databricks/eng-apps-devex can this one be reviewed, we are waiting on it to merge to continue improving this skill. Skill changes seem ok, I'm not sure about the manifest version bump from 0.1.0 to 0.3.0. cc: @simonfaltum |
…loy idempotency Correct high-severity findings in the dev-side references ported from a-d-k: - mlflow==2.22.0 -> mlflow>=3.0 (genai-agents, custom-pyfunc, training-and-serving): ResponsesAgent imports, log_model(name=), and create_* helpers are MLflow 3.0+ and fail to load under the pinned 2.x. - training-and-serving: register the BEST HPO trial, not the last. autolog no longer sets registered_model_name (it registered one UC version per trial); refit on study.best_params and register once so @prod is the best-AUC model. - genai-agents: deploy-idempotency jq .runs[]? -> .[]? (jobs list-runs returns a bare array, so the guard silently matched nothing and re-deployed). - genai-agents: deploy_agent.py reads dbutils.widgets, not sys.argv (notebook tasks don't receive argv); submit prose passes notebook_task.base_parameters. Co-authored-by: Isaac
Summary
Merges the dev-side surface of
experimental:databricks-ml-training-serving/into stable'sdatabricks-model-serving. Closes #73's TODO #1b.The two skills had near-zero content overlap — stable was ops-focused (manage existing endpoints via CLI); experimental was dev-focused (train, register, log a PyFunc or
ResponsesAgent, deploy). Combining them avoids forcing users to invoke two skills for what is functionally one workflow.Shape:
references/files carry the dev-side flow verbatim from experimental.descriptionbroadened so agent routing fires on dev-side asks too (train, register, PyFunc, ResponsesAgent) — seedescriptionfield for the full trigger phrase list. NOT for: no-code agents (usedatabricks-agent-bricks); MLflow scorers (usedatabricks-mlflow-evaluation).Changes
references/training-and-serving.mddatabricks-ml-training-serving/SKILL.mdmlflow.{sklearn,xgboost,…}.autolog()patterns, UC alias-based promotion (@prod/@challenger), batch scoring viaspark_udf, real-time endpoint create + zero-downtime version swap,state.readyvsstate.config_updatepoll-both gotcha,jobs submit --no-waitserverless deploy pattern, Foundation Model API endpoints runtime-list (replaces the earlier static catalog draft per @QuentinAmbard's review), and the gotchas trap-table.references/custom-pyfunc.mddatabricks-ml-training-serving/1-custom-pyfunc.mdpython_model="model.py"),infer_signature,code_paths, pre-deploy validation viamlflow.models.predict(env_manager="uv").references/genai-agents.mddatabricks-ml-training-serving/2-genai-agents.mdResponsesAgentinterface, LangGraph agent withUCFunctionToolkit+VectorSearchRetrieverTool, thecreate_text_output_itemraw-dict-silently-fails gotcha, theresources=[...]passthrough-auth list, async deploy viaagents.deploy()from a serverless job, query via CLI and OpenAI-compatible client.All 3 ports are verbatim — only mechanical adjustments:
1-custom-pyfunc.md→custom-pyfunc.md,2-genai-agents.md→genai-agents.md.references/location. Stable peers (databricks-jobs) use../../; experimental-only peers (databricks-agent-bricks,databricks-mlflow-evaluation,databricks-vector-search,databricks-unity-catalog) use../../../experimental/.SKILL.md updates (kept tight — ops focus preserved):
references/training-and-serving.md#foundation-model-api-endpoints.### Develop & deploy new modelssubsection under "What's Next" with a 3-row table linking the new references.descriptionexpanded to cover the dev surface (see above).Manifest: regenerated via
python3 scripts/skills.py generate.Reviewer history
Earlier commits on this branch made two mistakes that have since been corrected:
main:databricks-model-serving/rather thanexperimental:databricks-ml-training-serving/— caught by @QuentinAmbard, reworked.databricks serving-endpoints list | jq ...plus runtime-resolved defaults. Replaced with the runtime-list snippet.Coverage vs. #73 TODO #1b
training-and-serving.md§ Train and registercustom-pyfunc.md(whole file)ResponsesAgent+create_text_output_itemgotchagenai-agents.md§ CRITICAL: output items must use helper methodsUCFunctionToolkit+VectorSearchRetrieverToolresource passthroughgenai-agents.md§ Log + register + § Resources that need passthrough authtraining-and-serving.md§ Foundation Model API endpointsPlus content the original TODO didn't enumerate: batch scoring via
spark_udf, real-time endpoint create + version swap, thestate.readyvsstate.config_updatepoll-both gotcha, serverlessjobs submit --no-waitdeploy pattern, the consolidated gotchas trap-table.Known follow-ups (out of scope)
references/training-and-serving.mdhas an anchor link#one-time-runs-jobs-submit--async-pattern-for-notebooksintodatabricks-jobs/SKILL.md. The section exists in a-d-k'sdatabricks-jobsbut not yet in d-a-sdatabricks-jobs/SKILL.md. Link falls back to the file top.references/off-platform-streaming.md(pre-existing from #76) is in the manifest but unwired from SKILL.md. Untouched by this PR.databricks-vector-searchto stable, the../../../experimental/databricks-vector-search/SKILL.mdlink intraining-and-serving.mdshould be flipped to the stable path. skills: promote databricks-vector-search to stable #87's link-sweep should handle it.Test plan
python3 scripts/skills.py generateclean.python3 scripts/skills.py validatepasses (Everything is up to date.).@databricks/eng-apps-devexper CODEOWNERS).This pull request and its description were written by Claude.