experimental: add databricks-metric-view-advisor skill#112
experimental: add databricks-metric-view-advisor skill#112dipankarkush-db wants to merge 8 commits into
Conversation
4e2f55e to
dd217a0
Compare
Adds a guided, multi-source advisor for creating Unity Catalog metric views. Unlike a single-input helper, it synthesizes schemas, AI/BI dashboards, SQL query files, Genie spaces, and KPI files into richer, deduplicated suggestions, checks for overlap with existing metric views, and walks deployment end to end via an interactive 7-step workflow. Ported from a Claude Code plugin and genericized to the open Agent Skills standard: - stable frontmatter (name, description, compatibility, metadata.version), parent: databricks-core - all agent/MCP-specific tool calls replaced with databricks CLI + SQL Statements API (mechanics in references/cli-operations.md) - auth/profile/warehouse handling deferred to the parent databricks-core - least-privilege grants and obfuscated placeholders throughout References: cli-operations, input-handlers, patterns, yaml-reference. Registered in scripts/skills.py SKILL_METADATA and .claude-plugin keywords; manifest regenerated via scripts/skills.py. validate passes. Co-authored-by: Isaac Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
dd217a0 to
352b5ef
Compare
…rimental Per maintainer guidance (Simon Faltum) to land faster while the stable tier is in flux, relocate the skill from skills/ to experimental/. - git mv skills/databricks-metric-view-advisor -> experimental/ - Drop `parent: databricks-core` and make the skill self-contained: experimental skills install standalone (`aitools install <name> --experimental` does not pull in databricks-core), so the parent reference would dangle. Re-inlined the profile/auth prerequisite in SKILL.md Step 1a and cli-operations.md; all CLI/SQL commands already live in references/cli-operations.md. - De-register from stable plumbing: removed the SKILL_METADATA entry in scripts/skills.py and the "metric-view-advisor" keyword in .claude-plugin/plugin.json (the Claude marketplace plugin ships stable skills only). - READMEs: removed the stable "Available Skills" bullet; added an entry to experimental/README.md (Analytics & Dashboards). - Regenerated manifest.json (skill now under repo_dir: experimental). scripts/skills.py validate passes. Co-authored-by: Isaac Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…symlink The run-output folder used a `latest` symlink (`ln -sfn run_<ts> latest`). Symlinks work on a local POSIX filesystem but do NOT resolve in the Databricks Workspace filesystem (where Genie Code runs) — the link object is created but cannot be navigated or read through. Replace it with a portable `latest.txt` pointer file (a single line naming the most recent run folder), which works in every environment. Added a fallback note to pick the lexicographically-largest run_* folder. Co-authored-by: Isaac Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…ploy-example SQL - input-handlers.md: Input 5 step 2 referenced "Input 1, step 1" (catalog/ schema DESCRIBE only); mapping KPIs to columns needs the table schema, so point to Input 1 step 2 (list tables + discover-schema), matching Input 3. - patterns.md: the SQL Statements API deploy example used DATE_TRUNC(MONTH, ...) with the quotes dropped (invalid SQL). Use a quote-free EXTRACT(YEAR FROM ...) expr and add a note on escaping single quotes inside a single-quoted --json argument (or use --json @file). Co-authored-by: Isaac Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…ot prerequisite Reorder the opening so the first paragraph states what the skill does and when to use it, with the CLI prerequisite moving just below. Generic skill-authoring hygiene — helps any agent/indexer that reads the title + first paragraph, and mirrors the frontmatter description. No workflow or behavior change. Co-authored-by: Isaac Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…; fix re-auth; bump CLI floor Addresses PR review feedback: - Switch long metric-view DDL from raw 'api post /api/2.0/sql/statements' to 'aitools tools statement submit --file' -> 'statement get'. The file-based path removes the $$/JSON-escaping fragility that motivated the raw API. - Re-authentication now uses 'databricks auth login --profile <PROFILE>'; --host is only for creating a new profile (avoids host-mismatch errors). - Bump CLI compatibility floor v0.292.0 -> v0.299.1 (statement subcommand). - Align Step 4 (saves <name>.sql) with Step 6 (submits that saved file). - Replace the inline #-comment file-content deploy example with a clean pointer. Co-authored-by: Isaac
…scenario matrix) - evals/check_examples.py: static consistency eval (no workspace). Validates every metric-view YAML, MEASURE() quoting, DATEDIFF rule, Python snippets, fixtures, CLI subcommands (--live probes the installed CLI), regression guards for the statement/auth fixes, and relative links. 11 checks / 16 with --live. - evals/SCENARIOS.md: behavioral scenario matrix, one per input type plus merge/overlap/snowflake, each with a deploy-and-query gating assertion. - evals/README.md: methodology and how to run. - Regenerate manifest.json to track the new files. Co-authored-by: Isaac
dustinvannoy-db
left a comment
There was a problem hiding this comment.
Some specific comments are there for you to address for the parts I could review manually.
Higher level -- my concern is overlap with databricks-metric-views which can easily be referenced instead for parts of this. Here is the writeup on it:
Substantial overlap with experimental/databricks-metric-views/
The two skills are different in purpose — the existing one is a concise single-input reference; the advisor is a multi-source interactive workflow. That distinction is
legitimate and the PR description acknowledges it. But the reference files are heavily duplicated:
| File | Existing | Advisor | Overlap |
|---|---|---|---|
yaml-reference.md |
338 lines | 583 lines | Near-identical structure: Top-Level Fields, Dimensions, Measures, Window Measures, Joins (Star/Snowflake/USING), Materialization, Complete Example. Advisor adds Composability, Semantic Metadata, LOD, Gotchas. |
patterns.md |
659 lines | 458 lines | Same patterns: single-table, ratios, CASE dimensions, star/snowflake schema, materialization, window measures, SQL-source fallback. |
So ~900 lines of YAML-spec/pattern documentation now exist in two near-parallel copies. When the metric-view YAML spec changes, both must be updated in lockstep — exactly
the doc-drift the advisor's own check_examples.py eval is designed to catch within a skill, but it can't catch drift between the two skills.
Options to raise with the author:
- Make the advisor reference the existing skill as parent (or via the core hierarchy). CLAUDE.md prescribes databricks-core → product → niche; the advisor declares no
parent and inlines everything. Pointing its YAML/pattern references at databricks-metric-views would eliminate the duplicate spec docs while keeping the advisor's unique
value (the interactive workflow, input-handlers, overlap-detection). The "self-contained, no parent" choice is the root cause of the duplication. - Or consolidate: fold the advisor's richer yaml-reference.md/patterns.md improvements (Composability, Semantic Metadata, LOD, Gotchas) back into the existing skill and have the advisor link to them.
There was a problem hiding this comment.
Remove, we haven't been storing evals this way in this repo.
There was a problem hiding this comment.
Remove, we haven't been storing evals this way in this repo.
There was a problem hiding this comment.
Remove, we haven't been storing evals this way in this repo.
| "references/yaml-reference.md" | ||
| ], | ||
| "repo_dir": "experimental", | ||
| "version": "1.0.0" |
| ### 📊 Analytics & Dashboards | ||
| - **databricks-aibi-dashboards** - Databricks AI/BI dashboards (with SQL validation workflow) | ||
| - **databricks-metric-views** - Metric Views for governed metrics | ||
| - **databricks-metric-view-advisor** - Guided, multi-source workflow to create Unity Catalog metric views from schemas, AI/BI dashboards, SQL queries, Genie spaces, or KPI files |
There was a problem hiding this comment.
Move this up to be before databricks-metric-views
| databricks experimental aitools tools get-default-warehouse --profile <PROFILE> | ||
| ``` | ||
|
|
||
| Store the warehouse id for all SQL execution this session. The `query` / `discover-schema` tools auto-pick the default warehouse, so an explicit id is only needed for the `statement submit` path (pass `--warehouse <ID>` or set `DATABRICKS_WAREHOUSE_ID`). Do NOT ask the user about the warehouse — pick the default automatically. |
There was a problem hiding this comment.
Will this still allow a user to ask for a specific warehouse? Not a blocker, but seems like this could allow them to override default/best warehouse.
|
|
||
| **STOP — wait for the user to acknowledge the analysis before proceeding to suggestions.** | ||
|
|
||
| ### Step 3: Suggest Metric Views |
There was a problem hiding this comment.
Move this whole Step 3 section to its own reference file to shorten SKILL.md, simply give it a pointer to that reference and guidance on when to use it.
| - **"Proceed" / "updated" / "3"** → re-read `suggestions.yaml` from the run folder, then proceed to Step 4 | ||
| - **User provides a file path** → read that file, parse it as the suggestions YAML, then proceed to Step 4 | ||
|
|
||
| ### Step 4: Create Metric View Definitions |
There was a problem hiding this comment.
Move this whole Step 4 section to its own reference file to shorten SKILL.md, simply give it a pointer to that reference and guidance on when to use it.
|
|
||
| Create Unity Catalog metric views from your existing Databricks assets — gold/fact schemas, AI/BI dashboards, SQL queries, Genie spaces, or KPI files. This advisor guides an interactive workflow that analyzes those sources, synthesizes them into richer, deduplicated suggestions, checks for overlap with views that already exist, and walks deployment end to end. Unlike a single-input "create a metric view" helper, it combines **multiple input sources** into one coherent set of definitions. | ||
|
|
||
| **Prerequisite:** a working Databricks CLI (>= v0.299.1) authenticated to a workspace profile. All CLI/SQL commands this skill needs are documented in **[references/cli-operations.md](references/cli-operations.md)** — read that file before running any command in the steps below. |
There was a problem hiding this comment.
databricks-metric-views skill requires v1.0.0. Let's change to use that as the floor so it's consistent.
| @@ -0,0 +1,167 @@ | |||
| # CLI & API Operations | |||
|
|
|||
| All operations in this skill run through the **Databricks CLI** (>= v0.299.1), authenticated to a workspace profile. To create a **new** profile, run `databricks auth login --host <workspace-url> --profile <PROFILE>`; to re-authenticate an **existing** profile, just run `databricks auth login --profile <PROFILE>` (the host is already stored — passing `--host` again is unnecessary and can error on a mismatch). This file documents the specific commands the workflow relies on. | |||
There was a problem hiding this comment.
use version 1.0.0 as floor
Reviewer feedback from dustinvannoy-db: - Reference parent databricks-metric-views (Option A): add `parent:` frontmatter + a mandatory-dependency notice (SKILL.md REQUIRED callout and Prerequisites, README entry, reference-file headers). Dedupe the shared spec/patterns: yaml-reference.md keeps only advisor-unique additions (gotchas, expanded source, composability, extra measure/join rules, semantic metadata, LOD, extra materialization detail, correct dot-chain example) and points to the parent for the baseline; patterns.md keeps the metadata-rich templates, correctly-quoted star/snowflake joins, window measures, and the SQL-source fallback, pointing to the parent for ratio/filtered/TPC-H/materialized/detailed patterns. No content lost — everything is either inline or in the parent. - Remove evals/ (not how this repo stores evals). - Version 1.0.0 -> 0.1.0; CLI floor v0.299.1 -> v1.0.0 (SKILL.md, cli-operations.md) to match databricks-metric-views. - Auth: use `databricks auth describe` instead of minting a token. - Allow the user to override the auto-selected SQL warehouse. - Slim SKILL.md: extract Step 3 and Step 4 into dedicated reference files with pointers + STOP gating retained. - README: order advisor before databricks-metric-views. - Regenerate manifest.json (scripts/skills.py validate passes). Co-authored-by: Isaac
|
Thanks for the thorough review, @dustinvannoy-db! Addressed everything in High-level: overlap with
|
Summary
Adds
databricks-metric-view-advisorunderexperimental/, a self-contained skill that guides users through creating Unity Catalog metric views via an interactive, multi-step workflow.Unlike a single-input "create a metric view" helper, this advisor synthesizes multiple input sources — gold/fact schemas, AI/BI dashboards, SQL query files, Genie spaces, and KPI spreadsheets — into richer, deduplicated suggestions. It also checks for semantic overlap with metric views that already exist in the target schema (offering extend / replace / create-alongside / skip), generates the YAML definitions, and walks deployment, verification, and sample queries end to end.
Per maintainer guidance, this lands in
experimental/to begin with (faster to merge while the stable tier is in flux).Layout (standard skill anatomy):
parentskill. All operations use thedatabricksCLI (experimental aitools tools query/discover-schema/get-default-warehouse) and the SQL Statements API for long DDL; the profile/auth prerequisite is inlined. No agent- or MCP-specific tooling.manifest.jsonregenerated viascripts/skills.py generate;scripts/skills.py validatepasses. Does not touchskills/,scripts/skills.pySKILL_METADATA, or.claude-plugin/.Documentation safety checklist
ALL PRIVILEGES, admin tokens, or broad scopes)This pull request and its description were written by Isaac.