Skip to content

experimental: add databricks-metric-view-advisor skill#112

Open
dipankarkush-db wants to merge 8 commits into
databricks:mainfrom
dipankarkush-db:add-databricks-metric-view-advisor
Open

experimental: add databricks-metric-view-advisor skill#112
dipankarkush-db wants to merge 8 commits into
databricks:mainfrom
dipankarkush-db:add-databricks-metric-view-advisor

Conversation

@dipankarkush-db
Copy link
Copy Markdown

@dipankarkush-db dipankarkush-db commented May 29, 2026

Summary

Adds databricks-metric-view-advisor under experimental/, a self-contained skill that guides users through creating Unity Catalog metric views via an interactive, multi-step workflow.

Unlike a single-input "create a metric view" helper, this advisor synthesizes multiple input sources — gold/fact schemas, AI/BI dashboards, SQL query files, Genie spaces, and KPI spreadsheets — into richer, deduplicated suggestions. It also checks for semantic overlap with metric views that already exist in the target schema (offering extend / replace / create-alongside / skip), generates the YAML definitions, and walks deployment, verification, and sample queries end to end.

Per maintainer guidance, this lands in experimental/ to begin with (faster to merge while the stable tier is in flux).

Layout (standard skill anatomy):

experimental/databricks-metric-view-advisor/
├── SKILL.md                      # interactive 7-step workflow
├── references/
│   ├── cli-operations.md         # SQL exec, dashboard/Genie fetch, deploy/query mechanics
│   ├── input-handlers.md         # per-input-source analysis
│   ├── patterns.md               # YAML pattern templates
│   └── yaml-reference.md         # full YAML spec
├── examples/                     # sample KPI/query fixtures
├── agents/openai.yaml            # Codex metadata (generated, lightly curated)
└── assets/databricks.{svg,png}   # shared icons
  • Self-contained — no parent skill. All operations use the databricks CLI (experimental aitools tools query/discover-schema/get-default-warehouse) and the SQL Statements API for long DDL; the profile/auth prerequisite is inlined. No agent- or MCP-specific tooling.
  • manifest.json regenerated via scripts/skills.py generate; scripts/skills.py validate passes. Does not touch skills/, scripts/skills.py SKILL_METADATA, or .claude-plugin/.

Note: this is a different artifact from the existing experimental/databricks-metric-views reference skill — that one is a concise single-input reference imported from ai-dev-kit; this is a richer multi-source, interactive advisor. Left untouched.

Documentation safety checklist

  • Examples use least-privilege permissions (no unnecessary ALL PRIVILEGES, admin tokens, or broad scopes)
  • Elevated permissions are explicitly called out where required
  • Sensitive values are obfuscated (placeholder workspace IDs, URLs, no real tokens)
  • No insecure patterns introduced (e.g. disabled TLS verification, hardcoded credentials)

This pull request and its description were written by Isaac.

@dipankarkush-db dipankarkush-db force-pushed the add-databricks-metric-view-advisor branch from 4e2f55e to dd217a0 Compare May 29, 2026 22:36
Adds a guided, multi-source advisor for creating Unity Catalog metric
views. Unlike a single-input helper, it synthesizes schemas, AI/BI
dashboards, SQL query files, Genie spaces, and KPI files into richer,
deduplicated suggestions, checks for overlap with existing metric views,
and walks deployment end to end via an interactive 7-step workflow.

Ported from a Claude Code plugin and genericized to the open Agent
Skills standard:
- stable frontmatter (name, description, compatibility, metadata.version),
  parent: databricks-core
- all agent/MCP-specific tool calls replaced with databricks CLI + SQL
  Statements API (mechanics in references/cli-operations.md)
- auth/profile/warehouse handling deferred to the parent databricks-core
- least-privilege grants and obfuscated placeholders throughout

References: cli-operations, input-handlers, patterns, yaml-reference.
Registered in scripts/skills.py SKILL_METADATA and .claude-plugin
keywords; manifest regenerated via scripts/skills.py. validate passes.

Co-authored-by: Isaac
Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
@dipankarkush-db dipankarkush-db force-pushed the add-databricks-metric-view-advisor branch from dd217a0 to 352b5ef Compare May 29, 2026 22:48
…rimental

Per maintainer guidance (Simon Faltum) to land faster while the stable
tier is in flux, relocate the skill from skills/ to experimental/.

- git mv skills/databricks-metric-view-advisor -> experimental/
- Drop `parent: databricks-core` and make the skill self-contained:
  experimental skills install standalone (`aitools install <name>
  --experimental` does not pull in databricks-core), so the parent
  reference would dangle. Re-inlined the profile/auth prerequisite in
  SKILL.md Step 1a and cli-operations.md; all CLI/SQL commands already
  live in references/cli-operations.md.
- De-register from stable plumbing: removed the SKILL_METADATA entry in
  scripts/skills.py and the "metric-view-advisor" keyword in
  .claude-plugin/plugin.json (the Claude marketplace plugin ships stable
  skills only).
- READMEs: removed the stable "Available Skills" bullet; added an entry
  to experimental/README.md (Analytics & Dashboards).
- Regenerated manifest.json (skill now under repo_dir: experimental).
  scripts/skills.py validate passes.

Co-authored-by: Isaac
Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
@dipankarkush-db dipankarkush-db changed the title skills: add databricks-metric-view-advisor stable skill experimental: add databricks-metric-view-advisor skill May 30, 2026
…symlink

The run-output folder used a `latest` symlink (`ln -sfn run_<ts> latest`).
Symlinks work on a local POSIX filesystem but do NOT resolve in the
Databricks Workspace filesystem (where Genie Code runs) — the link object
is created but cannot be navigated or read through. Replace it with a
portable `latest.txt` pointer file (a single line naming the most recent
run folder), which works in every environment. Added a fallback note to
pick the lexicographically-largest run_* folder.

Co-authored-by: Isaac
Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…ploy-example SQL

- input-handlers.md: Input 5 step 2 referenced "Input 1, step 1" (catalog/
  schema DESCRIBE only); mapping KPIs to columns needs the table schema, so
  point to Input 1 step 2 (list tables + discover-schema), matching Input 3.
- patterns.md: the SQL Statements API deploy example used
  DATE_TRUNC(MONTH, ...) with the quotes dropped (invalid SQL). Use a
  quote-free EXTRACT(YEAR FROM ...) expr and add a note on escaping single
  quotes inside a single-quoted --json argument (or use --json @file).

Co-authored-by: Isaac
Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…ot prerequisite

Reorder the opening so the first paragraph states what the skill does and
when to use it, with the CLI prerequisite moving just below. Generic
skill-authoring hygiene — helps any agent/indexer that reads the title +
first paragraph, and mirrors the frontmatter description. No workflow or
behavior change.

Co-authored-by: Isaac
Signed-off-by: Dipankar Kushari <dipankar.kushari@databricks.com>
…; fix re-auth; bump CLI floor

Addresses PR review feedback:
- Switch long metric-view DDL from raw 'api post /api/2.0/sql/statements' to
  'aitools tools statement submit --file' -> 'statement get'. The file-based
  path removes the $$/JSON-escaping fragility that motivated the raw API.
- Re-authentication now uses 'databricks auth login --profile <PROFILE>';
  --host is only for creating a new profile (avoids host-mismatch errors).
- Bump CLI compatibility floor v0.292.0 -> v0.299.1 (statement subcommand).
- Align Step 4 (saves <name>.sql) with Step 6 (submits that saved file).
- Replace the inline #-comment file-content deploy example with a clean pointer.

Co-authored-by: Isaac
…scenario matrix)

- evals/check_examples.py: static consistency eval (no workspace). Validates
  every metric-view YAML, MEASURE() quoting, DATEDIFF rule, Python snippets,
  fixtures, CLI subcommands (--live probes the installed CLI), regression
  guards for the statement/auth fixes, and relative links. 11 checks / 16 with --live.
- evals/SCENARIOS.md: behavioral scenario matrix, one per input type plus
  merge/overlap/snowflake, each with a deploy-and-query gating assertion.
- evals/README.md: methodology and how to run.
- Regenerate manifest.json to track the new files.

Co-authored-by: Isaac
@dipankarkush-db dipankarkush-db requested a review from a team as a code owner June 2, 2026 15:26
Copy link
Copy Markdown
Collaborator

@dustinvannoy-db dustinvannoy-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some specific comments are there for you to address for the parts I could review manually.

Higher level -- my concern is overlap with databricks-metric-views which can easily be referenced instead for parts of this. Here is the writeup on it:
Substantial overlap with experimental/databricks-metric-views/ ⚠️

The two skills are different in purpose — the existing one is a concise single-input reference; the advisor is a multi-source interactive workflow. That distinction is
legitimate and the PR description acknowledges it. But the reference files are heavily duplicated:

File Existing Advisor Overlap
yaml-reference.md 338 lines 583 lines Near-identical structure: Top-Level Fields, Dimensions, Measures, Window Measures, Joins (Star/Snowflake/USING), Materialization, Complete Example. Advisor adds Composability, Semantic Metadata, LOD, Gotchas.
patterns.md 659 lines 458 lines Same patterns: single-table, ratios, CASE dimensions, star/snowflake schema, materialization, window measures, SQL-source fallback.

So ~900 lines of YAML-spec/pattern documentation now exist in two near-parallel copies. When the metric-view YAML spec changes, both must be updated in lockstep — exactly
the doc-drift the advisor's own check_examples.py eval is designed to catch within a skill, but it can't catch drift between the two skills.

Options to raise with the author:

  • Make the advisor reference the existing skill as parent (or via the core hierarchy). CLAUDE.md prescribes databricks-core → product → niche; the advisor declares no
    parent and inlines everything. Pointing its YAML/pattern references at databricks-metric-views would eliminate the duplicate spec docs while keeping the advisor's unique
    value (the interactive workflow, input-handlers, overlap-detection). The "self-contained, no parent" choice is the root cause of the duplication.
  • Or consolidate: fold the advisor's richer yaml-reference.md/patterns.md improvements (Composability, Semantic Metadata, LOD, Gotchas) back into the existing skill and have the advisor link to them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, we haven't been storing evals this way in this repo.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, we haven't been storing evals this way in this repo.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, we haven't been storing evals this way in this repo.

Comment thread manifest.json Outdated
"references/yaml-reference.md"
],
"repo_dir": "experimental",
"version": "1.0.0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to 0.1.0

Comment thread experimental/README.md Outdated
### 📊 Analytics & Dashboards
- **databricks-aibi-dashboards** - Databricks AI/BI dashboards (with SQL validation workflow)
- **databricks-metric-views** - Metric Views for governed metrics
- **databricks-metric-view-advisor** - Guided, multi-source workflow to create Unity Catalog metric views from schemas, AI/BI dashboards, SQL queries, Genie spaces, or KPI files
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up to be before databricks-metric-views

databricks experimental aitools tools get-default-warehouse --profile <PROFILE>
```

Store the warehouse id for all SQL execution this session. The `query` / `discover-schema` tools auto-pick the default warehouse, so an explicit id is only needed for the `statement submit` path (pass `--warehouse <ID>` or set `DATABRICKS_WAREHOUSE_ID`). Do NOT ask the user about the warehouse — pick the default automatically.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this still allow a user to ask for a specific warehouse? Not a blocker, but seems like this could allow them to override default/best warehouse.


**STOP — wait for the user to acknowledge the analysis before proceeding to suggestions.**

### Step 3: Suggest Metric Views
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this whole Step 3 section to its own reference file to shorten SKILL.md, simply give it a pointer to that reference and guidance on when to use it.

- **"Proceed" / "updated" / "3"** → re-read `suggestions.yaml` from the run folder, then proceed to Step 4
- **User provides a file path** → read that file, parse it as the suggestions YAML, then proceed to Step 4

### Step 4: Create Metric View Definitions
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this whole Step 4 section to its own reference file to shorten SKILL.md, simply give it a pointer to that reference and guidance on when to use it.


Create Unity Catalog metric views from your existing Databricks assets — gold/fact schemas, AI/BI dashboards, SQL queries, Genie spaces, or KPI files. This advisor guides an interactive workflow that analyzes those sources, synthesizes them into richer, deduplicated suggestions, checks for overlap with views that already exist, and walks deployment end to end. Unlike a single-input "create a metric view" helper, it combines **multiple input sources** into one coherent set of definitions.

**Prerequisite:** a working Databricks CLI (>= v0.299.1) authenticated to a workspace profile. All CLI/SQL commands this skill needs are documented in **[references/cli-operations.md](references/cli-operations.md)** — read that file before running any command in the steps below.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

databricks-metric-views skill requires v1.0.0. Let's change to use that as the floor so it's consistent.

@@ -0,0 +1,167 @@
# CLI & API Operations

All operations in this skill run through the **Databricks CLI** (>= v0.299.1), authenticated to a workspace profile. To create a **new** profile, run `databricks auth login --host <workspace-url> --profile <PROFILE>`; to re-authenticate an **existing** profile, just run `databricks auth login --profile <PROFILE>` (the host is already stored — passing `--host` again is unnecessary and can error on a mismatch). This file documents the specific commands the workflow relies on.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use version 1.0.0 as floor

Reviewer feedback from dustinvannoy-db:

- Reference parent databricks-metric-views (Option A): add `parent:`
  frontmatter + a mandatory-dependency notice (SKILL.md REQUIRED callout
  and Prerequisites, README entry, reference-file headers). Dedupe the
  shared spec/patterns: yaml-reference.md keeps only advisor-unique
  additions (gotchas, expanded source, composability, extra measure/join
  rules, semantic metadata, LOD, extra materialization detail, correct
  dot-chain example) and points to the parent for the baseline;
  patterns.md keeps the metadata-rich templates, correctly-quoted
  star/snowflake joins, window measures, and the SQL-source fallback,
  pointing to the parent for ratio/filtered/TPC-H/materialized/detailed
  patterns. No content lost — everything is either inline or in the parent.
- Remove evals/ (not how this repo stores evals).
- Version 1.0.0 -> 0.1.0; CLI floor v0.299.1 -> v1.0.0 (SKILL.md,
  cli-operations.md) to match databricks-metric-views.
- Auth: use `databricks auth describe` instead of minting a token.
- Allow the user to override the auto-selected SQL warehouse.
- Slim SKILL.md: extract Step 3 and Step 4 into dedicated reference files
  with pointers + STOP gating retained.
- README: order advisor before databricks-metric-views.
- Regenerate manifest.json (scripts/skills.py validate passes).

Co-authored-by: Isaac
@dipankarkush-db
Copy link
Copy Markdown
Author

Thanks for the thorough review, @dustinvannoy-db! Addressed everything in f6c4c0a. Mapping each point to the change:

High-level: overlap with experimental/databricks-metric-views/

Went with Option A — the advisor now declares parent: databricks-metric-views and references it for the shared spec instead of duplicating it:

  • yaml-reference.md (583 → ~300 lines): keeps only advisor-unique additions (gotchas table, expanded source options, composability, extra measure/join rules, semantic metadata, LOD, extra materialization detail, and a correct dot-chain example); points to the parent for the baseline spec.
  • patterns.md: keeps the metadata-rich templates, correctly-quoted star/snowflake joins, window measures, and the SQL-source fallback; points to the parent for the ratio/filtered/TPC-H/materialized/detailed-window patterns.
  • No content lost — everything is either inline in the advisor or in the parent. Net −1,104 / +385 lines.
  • The dependency is made mandatory and explicit: a ⚠️ REQUIRED callout + Prerequisites in SKILL.md, a note on the README entry, and headers in both reference files (so a user can't pick up the advisor without knowing they need the parent).

Inline comments

  • Remove evals/ (check_examples.py, README.md, SCENARIOS.md) — deleted.
  • manifest.json0.1.0 — regenerated via scripts/skills.py; validate passes.
  • SKILL.md version → 0.1.0 — done.
  • CLI floor → v1.0.0 (consistency with databricks-metric-views) — updated in SKILL.md (frontmatter + prerequisite) and cli-operations.md.
  • Auth without minting a token — now uses databricks auth describe, falling back to databricks auth login --profile at session start.
  • Allow a specific warehouse — auto-picks the default, but now honors a user-specified warehouse if they name one. (was flagged as non-blocking)
  • Move Step 3 to a reference file — extracted to references/step-3-suggest-metric-views.md; SKILL.md keeps a pointer + STOP gating.
  • Move Step 4 to a reference file — extracted to references/step-4-create-definitions.md; same treatment.
  • README orderingdatabricks-metric-view-advisor now listed before databricks-metric-views.

SKILL.md is down from ~567 to ~285 lines. Let me know if you'd prefer the Materialized pattern restored inline too (it's currently delegated to the parent, where the example is identical).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants