refactor(jobs): run maintenance jobs on serverless compute by taran-dbx · Pull Request #30 · databricks-solutions/lakets

taran-dbx · 2026-06-01T14:59:59Z

Summary

The five LakeTS maintenance jobs are pure psycopg + Databricks SDK (no SparkSession — cold_rollup_refresh uses a SQL warehouse via the SDK, not Spark). This moves them off a provisioned cluster onto serverless compute.

Changes

Bundle (databricks/bundles/databricks.yml)

Removed existing_cluster_id from every task and the now-unused cluster_id variable (and its dev/prod overrides).
Dependencies (psycopg[binary], databricks-sdk) moved from task libraries (which don't apply on serverless) into a per-job environments block, reused across all five jobs via a YAML anchor and referenced from each task via environment_key: lakets_env.
run_as service principal (prod) and sync.paths are unchanged.

Source

partition_manager.py: fixed a stale docstring that referenced spark.conf.get(...); the instance name comes from the job parameter (sys.argv[1]) or LAKETS_INSTANCE.

Docs / CHANGELOG

Note that jobs run on serverless compute with dependencies declared in environments.

Test plan

All workflow files byte-compile; tests/test_python_patterns.py passes (11/11)
Bundle parses: every job has an environments spec, every task has environment_key: lakets_env, no existing_cluster_id, no cluster_id var, no task libraries
databricks bundle deploy -t prod --var="service_principal_name=<sp>" provisions the jobs on serverless and a manual run of each completes against a Lakebase instance
Confirm the serverless environment client version ("3") is valid in the target workspace; bump if needed

The five maintenance jobs are pure psycopg + SDK (no SparkSession), so run them on serverless instead of requiring a provisioned cluster. - bundle: drop existing_cluster_id and the cluster_id variable; declare dependencies (psycopg[binary], databricks-sdk) per job in an `environments` block (reused via a YAML anchor) and reference it from each task via environment_key; task `libraries` no longer apply on serverless - partition_manager: fix stale docstring (job parameter / LAKETS_INSTANCE, not spark.conf.get) - docs/CHANGELOG: note jobs run on serverless compute

taran-dbx merged commit 8017aab into main Jun 1, 2026
9 checks passed

taran-dbx deleted the refactor/jobs-serverless-compute branch June 1, 2026 15:00

github-actions Bot added documentation Improvements or additions to documentation databricks-workflows labels Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(jobs): run maintenance jobs on serverless compute#30

refactor(jobs): run maintenance jobs on serverless compute#30
taran-dbx merged 1 commit into
mainfrom
refactor/jobs-serverless-compute

taran-dbx commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taran-dbx commented Jun 1, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant