A complete, runnable Databricks Asset Bundles (DABs) teaching demo. One bundle file describes a UC schema + volume, a 4-task daily ETL job, and a Lakeview dashboard — then deploys them to dev, staging, or prod with environment-specific configuration. A single command deploys everything. A single command tears it all down.
Works on any Databricks workspace with Unity Catalog and a Serverless SQL Warehouse. If you don't have one, sign up for Databricks Free Edition — no credit card, single-user workspace with serverless pre-provisioned.
A Databricks Asset Bundle is declarative infrastructure-as-code for a Databricks workspace. You describe your jobs, pipelines, schemas, dashboards, and permissions in YAML. The CLI renders those descriptions into workspace resources at deploy time.
Key idea: the same bundle YAML deploys to dev with a prefixed, isolated name and to prod with a service principal, different catalog, and open schedule — no copy-paste, no drift.
dabs_simple_demo/
├── databricks.yml ← bundle root: identity, variables, targets (environments)
├── resources/
│ ├── schema.yml ← UC schema + managed volume
│ ├── job.yml ← daily_etl workflow (4 tasks)
│ └── dashboard.yml ← Lakeview dashboard
├── src/
│ ├── notebooks/ingest.py ← generates synthetic orders, COPY INTO bronze
│ ├── sql/
│ │ ├── 01_ddl.sql ← idempotent table DDL (parameterized)
│ │ ├── 02_silver.sql ← bronze → silver (typed, deduped)
│ │ └── 03_gold.sql ← silver → gold_daily_revenue aggregate
│ └── dashboard/demo.lvdash.json ← exported dashboard definition
├── scripts/
│ ├── demo_deploy.sh ← validate + deploy + run in one command
│ └── teardown.sh ← wipe the deployment for a clean re-run
├── tests/
│ ├── bundle_validate.sh ← validates all targets (no deploy)
│ └── unit/test_ingest_helpers.py ← pytest for the Python helper
└── azure-pipelines.yml ← CI/CD talk-track (not run live; see §8)
| Requirement | Notes |
|---|---|
| Databricks CLI ≥ 0.240 | brew install databricks or curl -fsSL … | sh |
| Terraform ≥ 1.5 | brew install hashicorp/tap/terraform |
| Databricks workspace | Any workspace with Unity Catalog + Serverless SQL Warehouse (Free Edition works) |
databricks auth login |
Configure a profile for your workspace |
Free Edition / single-catalog workspaces: catalog creation via the CLI may require an explicit storage location. Use the
workspacecatalog (pre-created on Free Edition) and differentiate environments by schema name — setcatalog: workspacein each target's variables. The isolation story is identical, just at the schema level (see §6).
Step 1 — Authenticate and create a named profile:
databricks auth login \
--host https://<your-workspace>.cloud.databricks.com \
--profile my-workspaceThis saves credentials to ~/.databrickscfg under the profile name my-workspace. Use any name you like — you'll pass it to every bundle command.
Step 2 — Set env vars once (add to your shell profile or a gitignored .env file):
# Your CLI profile name from Step 1
export DATABRICKS_CONFIG_PROFILE="my-workspace"
# UC catalog to deploy into (Free Edition default is "workspace")
export BUNDLE_VAR_catalog="workspace"
# System Terraform — required to avoid a CLI PGP key expiry bug
export DATABRICKS_TF_EXEC_PATH="$(which terraform)"
export DATABRICKS_TF_VERSION="1.15.5"With DATABRICKS_CONFIG_PROFILE set, all databricks bundle commands pick up the right workspace automatically — no --profile flag needed on every command.
Why a profile and not just
DATABRICKS_HOST?DATABRICKS_HOSTonly sets the URL — it doesn't carry credentials. The CLI needs both a host and a token/OAuth flow.databricks auth loginstores both in the profile;DATABRICKS_CONFIG_PROFILEtells the CLI which profile to use.
BUNDLE_VAR_cataloguses theBUNDLE_VAR_prefix convention — any bundle variablexcan be set this way without touching the YAML. You can also pass it inline:--var catalog=workspace.
Open databricks.yml. The file has four sections:
Bundle identity
bundle:
name: dabs_simple_demoInclude — pulls in the resource files so the root stays readable:
include:
- resources/*.ymlVariables — declared once, overridden per target. The lookup: form resolves a warehouse name to its ID at deploy time so you never hardcode IDs:
variables:
catalog:
description: UC catalog this target writes into
warehouse_id:
lookup:
warehouse: Serverless Starter Warehouse # resolved at deploy time
notifications_email:
default: ${workspace.current_user.userName} # falls back to current userTargets — one block per environment. workspace.host is intentionally absent from every target — the bundle resolves it from DATABRICKS_HOST (env var), DATABRICKS_CONFIG_PROFILE, or the --profile CLI flag. This keeps all workspace-specific values out of source control:
targets:
dev:
mode: development # auto-prefix names, auto-pause schedules
default: true
variables:
schema_name: dabs_demo_dev
run_as:
user_name: ${workspace.current_user.userName}
staging:
mode: development
variables:
schema_name: dabs_demo_staging
presets:
name_prefix: "[staging-${workspace.current_user.short_name}] "
prod:
mode: production # unpauses schedule, removes name prefix
workspace:
root_path: /Workspace/Shared/.bundle/${bundle.name}/${bundle.target}
variables:
schema_name: dabs_demo_prod
run_as:
service_principal_name: ${var.prod_sp} # SP, not a human
permissions:
- level: CAN_MANAGE
group_name: data-platform-adminscatalog has no hardcoded value in any target — it inherits the variable default (workspace) and is overridden per-environment via BUNDLE_VAR_catalog or --var catalog=<name>.
The three things targets do:
- Override
variables— what schema/config to use (catalogcomes from the environment) - Override behaviour via
mode:+run_as:+permissions: - Override
workspace.root_pathfor prod — moves bundle state to a shared folder
How workspace auth resolution works:
| Priority | Mechanism | Notes |
|---|---|---|
| 1 (highest) | --profile my-workspace CLI flag |
Explicit per-command override |
| 2 | DATABRICKS_CONFIG_PROFILE=my-workspace env var |
Recommended — set once in your shell |
| 3 | DATABRICKS_HOST + DATABRICKS_TOKEN env vars |
Host alone is not enough — token required too |
| 4 | Default profile in ~/.databrickscfg |
Used if nothing else is set |
For multi-environment promotion in CI/CD, set DATABRICKS_CONFIG_PROFILE (or DATABRICKS_HOST + DATABRICKS_CLIENT_ID + DATABRICKS_CLIENT_SECRET for OAuth M2M) as pipeline secret variables scoped per environment.
resources:
schemas:
demo:
catalog_name: ${var.catalog}
name: ${var.schema_name}
volumes:
demo_raw:
catalog_name: ${var.catalog}
schema_name: ${resources.schemas.demo.name} # cross-resource reference
name: raw
volume_type: MANAGED${resources.schemas.demo.name} resolves to the actual deployed schema name, including any mode: development prefix. This is how you chain resources without hardcoding names.
environments:
- environment_key: serverless_env
spec:
client: "1" # serverless notebook compute
tasks:
- task_key: ddl
sql_task:
warehouse_id: ${var.warehouse_id}
file: { path: ../src/sql/01_ddl.sql }
parameters:
catalog: ${var.catalog}
schema: ${resources.schemas.demo.name} # ← resolved name, not the variable
- task_key: ingest
depends_on: [{ task_key: ddl }]
environment_key: serverless_env
notebook_task:
notebook_path: ../src/notebooks/ingest.py
base_parameters:
catalog: ${var.catalog}
schema: ${resources.schemas.demo.name}
volume: ${resources.volumes.demo_raw.name}
- task_key: silver
depends_on: [{ task_key: ingest }]
sql_task: { … file: 02_silver.sql … }
- task_key: gold
depends_on: [{ task_key: silver }]
sql_task: { … file: 03_gold.sql … }
schedule:
quartz_cron_expression: "0 0 6 * * ?"
pause_status: PAUSED # mode:development auto-pauses; prod unpausesWhy
${resources.schemas.demo.name}and not${var.schema_name}? Inmode: developmentthe bundle prefixes the schema name with[target]-[user]-. If you pass the bare variable value as a SQL parameter the task writes to a different (unmanaged) schema. Always use the resource reference when passing the schema to tasks.
resources:
dashboards:
demo_dashboard:
display_name: "[${bundle.target}] DABs Demo Dashboard"
warehouse_id: ${var.warehouse_id}
parent_path: /Workspace/Users/${workspace.current_user.userName}
file_path: ../src/dashboard/demo.lvdash.jsonThe .lvdash.json is a dashboard export. By default, dataset queries inside it contain fully-qualified table names (catalog.schema.table) hardcoded to the environment they were exported from. There are three ways to handle multi-environment promotion:
Option 1 — dataset_catalog + dataset_schema (recommended):
Add these two fields to dashboard.yml and write your .lvdash.json SQL with unqualified table names only (just table_name or schema.table_name, no catalog prefix). DABs injects the catalog and schema at deploy time:
resources:
dashboards:
demo_dashboard:
display_name: "[${bundle.target}] DABs Demo Dashboard"
warehouse_id: ${var.warehouse_id}
parent_path: /Workspace/Users/${workspace.current_user.userName}
file_path: ../src/dashboard/demo.lvdash.json
dataset_catalog: ${var.catalog} # ← injected per target
dataset_schema: ${resources.schemas.demo.name} # ← injected per targetInside the .lvdash.json SQL, reference only the table name:
SELECT order_date, region, orders, revenue
FROM gold_daily_revenue -- no catalog.schema prefix
ORDER BY order_dateDABs applies dataset_catalog/dataset_schema as the default for any dataset query that doesn't specify them explicitly. See Bundle examples — Dashboard parameterization and Dashboard resource reference.
Option 2 — Re-export after first deploy per environment: Deploy to the target, open the deployed dashboard in the workspace UI, make any layout adjustments, then export the file:
# 1. Deploy the bundle (creates the dashboard in the target workspace)
databricks bundle deploy -t staging
# 2. Get the dashboard ID from bundle summary
databricks bundle summary -t staging -o json | python3 -c \
"import sys,json; d=json.load(sys.stdin); print(d['resources']['dashboards']['demo_dashboard']['id'])"
# 3. Export the dashboard file via the workspace API
DASHBOARD_ID=<id-from-step-2>
databricks api get /api/2.0/workspace/export \
--path "/Workspace/Users/${USER}/dabs_demo/staging/[staging] DABs Demo Dashboard.lvdash.json" \
--direct_download > src/dashboard/demo_staging.lvdash.json
# 4. Commit the updated file, point dashboard.yml file_path at it for the staging targetOption 3 — sed rewrite in CI before deploy:
If your SQL uses fully-qualified catalog.schema.table names, a pre-deploy substitution is the simplest mechanical fix. In mode: development the deployed schema name is prefixed with [target]-[username]-, so the pattern to replace looks like dev_<username>_dabs_demo_dev. In CI, use the known environment values:
# In azure-pipelines.yml, before `bundle deploy -t prod`:
sed -i "s/${DEV_SCHEMA}/${PROD_SCHEMA}/g" src/dashboard/demo.lvdash.jsonThis is fragile — any schema rename breaks it — but works for simple demos where the only difference is the schema name. Option 1 (dataset_catalog + dataset_schema) is strongly preferred.
| Setting | dev |
staging |
prod |
|---|---|---|---|
| Workspace host | DATABRICKS_HOST env var |
DATABRICKS_HOST env var |
DATABRICKS_HOST env var (prod workspace) |
| Catalog | BUNDLE_VAR_catalog (default: workspace) |
same | BUNDLE_VAR_catalog (prod catalog) |
| Schema (deployed name) | [dev-<user>-]dabs_demo_dev |
[staging-<short>-]dabs_demo_staging |
dabs_demo_prod |
| Schedule | Auto-paused (mode: development) |
Paused (same mode) | 06:00 ET daily — unpaused |
| Runs as | Current user | Current user | Service principal |
| Permissions block | none | none | CAN_MANAGE for admins group |
| Root path | per-user .bundle/ folder |
per-user .bundle/ folder |
/Workspace/Shared/.bundle/ |
Catalogs vs schemas for isolation: The ideal pattern is one catalog per environment (
dev_orders,staging_orders,prod_orders). On workspaces where catalog creation is restricted — including Free Edition, which gives you a singleworkspacecatalog — use one catalog with per-target schema names. The isolation story is identical, just one level down.
With DATABRICKS_CONFIG_PROFILE set in your environment (see §3), run:
# 1. Check the YAML is valid — fast, no workspace changes
databricks bundle validate -t dev
# 2. Deploy all resources to the workspace
databricks bundle deploy -t dev
# 3. Run the job and wait for completion
databricks bundle run daily_etl -t dev
# 4. Wipe everything — idempotent
databricks bundle destroy -t devOr use the helper scripts (they pick up all env vars automatically):
./scripts/demo_deploy.sh dev # validate + deploy + run in one shot
./scripts/teardown.sh dev # destroy + belt-and-suspenders schema dropIf you haven't set
DATABRICKS_CONFIG_PROFILE, add--profile <your-profile>to every command.
This file ships in the repo as a readable artifact. It illustrates the promotion model — it is not executed in the live demo.
The pipeline has four stages:
PR opened → Validate (bundle validate × 3 targets + pytest)
Merge to main → Deploy_Dev (automatic)
Push to release/* → Deploy_Staging (optional light approval)
Push a v* tag → Deploy_Prod ← PAUSES for human approval
The human gate lives in the Azure DevOps Environments UI for databricks-prod — not in the YAML. A reviewer clicks Approve; the pipeline then runs databricks bundle deploy -t prod using service-principal credentials scoped to that environment.
Auth per environment:
| Environment | Where creds live |
|---|---|
databricks-dev, databricks-staging |
Pipeline secret variables: DATABRICKS_HOST, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET |
databricks-prod |
Same variable names, different values, scoped to the databricks-prod ADO environment |
Same commands, different credentials, different workspace, different target. That is the whole promotion model.
Exact sequence for the live demo (single terminal, ~6 minutes).
Before you start — set env vars in your shell (one-time setup, survives restarts if added to your profile):
export DATABRICKS_CONFIG_PROFILE="my-workspace" # your profile from `databricks auth login`
export BUNDLE_VAR_catalog="workspace" # your UC catalog
export DATABRICKS_TF_EXEC_PATH="$(which terraform)"
export DATABRICKS_TF_VERSION="1.15.5"Pre-flight check — confirm workspace is clean:
./scripts/teardown.sh dev # no-op if already clean; safe to run every timeDemo sequence:
cd ~/dabs_simple_demo
# Step 1 — validate (show the clean output, ~5 seconds)
databricks bundle validate -t dev
# Step 2 — deploy (~20 seconds; narrate: job / schema / volume / dashboard)
databricks bundle deploy -t dev
# Step 3 — open workspace UI and show:
# → Workflows: "[dev <user>] daily_etl" — 4 tasks, schedule paused
# → Catalog Explorer: schema [dev <user>_]dabs_demo_dev, volume raw, 3 empty tables
# → Dashboards: "[dev] DABs Demo Dashboard" (placeholder widget)
# Step 4 — run the pipeline (~90 seconds)
databricks bundle run daily_etl -t dev
# Step 5 — verify data (optional — open Catalog Explorer, check table row counts)
# bronze_orders: 5,000 rows silver_orders: 5,000 rows gold_daily_revenue: ~120 rows
# Step 6 — teardown (show the workspace going clean, ~20 seconds)
./scripts/teardown.sh dev
# Step 7 — one-command redeploy (reinforces the point)
./scripts/demo_deploy.sh devTiming budget: validate 5s · deploy 20s · run 90s · UI tour 2 min · teardown 20s · redeploy 2 min.
These are the steps you can generate live on stage with Databricks Assistant. The pre-shipped files are the expected output — useful if generation goes sideways, or to skip ahead.
| 🪄 | Where | Prompt |
|---|---|---|
01_ddl.sql |
Workspace SQL editor | "Create idempotent DDL for bronze_orders (raw strings), silver_orders (typed + deduped on order_id), and gold_daily_revenue (order_date, region, orders, revenue). Use :catalog and :schema SQL parameters." |
ingest.py |
New notebook | "Generate 5,000 synthetic order rows with order_id, customer_id, product_id (P001–P020), region (N/S/E/W), order_date (last 30 days), amount (5–500). Write CSV to /Volumes/{catalog}/{schema}/raw/orders.csv then COPY INTO bronze_orders. Read catalog, schema, volume from dbutils widgets." |
02_silver.sql |
SQL editor | "INSERT OVERWRITE silver_orders from bronze_orders: cast order_date to DATE, amount to DECIMAL(10,2), dedupe on order_id keeping the latest _ingested_at." |
03_gold.sql |
SQL editor | "Aggregate silver_orders to gold_daily_revenue: group by order_date and region, sum(amount) as revenue, count() as orders."* |
| Dashboard widgets | Lakeview editor → Add with AI | "Line chart of revenue over time" and "Top 10 products by revenue" |
Add staging target |
VS Code + Databricks extension | "Add a target called staging that uses catalog workspace, schema_name dabs_demo_staging, and mode: development." |
To swap in your own pipeline:
- Replace
src/sql/01_ddl.sqlwith your DDL — keep:catalog/:schemaparameters. - Replace
src/notebooks/ingest.pywith your ingestion logic; keepdbutils.widgetsforcatalog/schema/volume. - Update
resources/job.ymltask list to match your steps; keep${resources.schemas.demo.name}for the schema parameter. - Export your own
.lvdash.jsonfrom the workspace after the first successful run. - Update
databricks.ymltargets with your workspace hosts, catalogs, and service principal.
The variable and resource-reference patterns stay the same regardless of domain.
./scripts/teardown.sh dev
./scripts/teardown.sh stagingOr directly:
databricks bundle destroy -t dev --auto-approvebundle destroy removes: the job, the dashboard, the UC schema (cascade: tables + volume). The workspace is left clean.