Skip to content

refactor(jobs): run maintenance jobs on serverless compute#30

Merged
taran-dbx merged 1 commit into
mainfrom
refactor/jobs-serverless-compute
Jun 1, 2026
Merged

refactor(jobs): run maintenance jobs on serverless compute#30
taran-dbx merged 1 commit into
mainfrom
refactor/jobs-serverless-compute

Conversation

@taran-dbx
Copy link
Copy Markdown
Collaborator

Summary

The five LakeTS maintenance jobs are pure psycopg + Databricks SDK (no SparkSessioncold_rollup_refresh uses a SQL warehouse via the SDK, not Spark). This moves them off a provisioned cluster onto serverless compute.

Changes

Bundle (databricks/bundles/databricks.yml)

  • Removed existing_cluster_id from every task and the now-unused cluster_id variable (and its dev/prod overrides).
  • Dependencies (psycopg[binary], databricks-sdk) moved from task libraries (which don't apply on serverless) into a per-job environments block, reused across all five jobs via a YAML anchor and referenced from each task via environment_key: lakets_env.
  • run_as service principal (prod) and sync.paths are unchanged.

Source

  • partition_manager.py: fixed a stale docstring that referenced spark.conf.get(...); the instance name comes from the job parameter (sys.argv[1]) or LAKETS_INSTANCE.

Docs / CHANGELOG

  • Note that jobs run on serverless compute with dependencies declared in environments.

Test plan

  • All workflow files byte-compile; tests/test_python_patterns.py passes (11/11)
  • Bundle parses: every job has an environments spec, every task has environment_key: lakets_env, no existing_cluster_id, no cluster_id var, no task libraries
  • databricks bundle deploy -t prod --var="service_principal_name=<sp>" provisions the jobs on serverless and a manual run of each completes against a Lakebase instance
  • Confirm the serverless environment client version ("3") is valid in the target workspace; bump if needed

The five maintenance jobs are pure psycopg + SDK (no SparkSession), so
run them on serverless instead of requiring a provisioned cluster.

- bundle: drop existing_cluster_id and the cluster_id variable; declare
  dependencies (psycopg[binary], databricks-sdk) per job in an
  `environments` block (reused via a YAML anchor) and reference it from
  each task via environment_key; task `libraries` no longer apply on
  serverless
- partition_manager: fix stale docstring (job parameter / LAKETS_INSTANCE,
  not spark.conf.get)
- docs/CHANGELOG: note jobs run on serverless compute
@taran-dbx taran-dbx merged commit 8017aab into main Jun 1, 2026
9 checks passed
@taran-dbx taran-dbx deleted the refactor/jobs-serverless-compute branch June 1, 2026 15:00
@github-actions github-actions Bot added documentation Improvements or additions to documentation databricks-workflows labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

databricks-workflows documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant