fix(jobs): __file__-safe serverless bootstrap + document cold-tier refresh by taran-dbx · Pull Request #31 · databricks-solutions/lakets

taran-dbx · 2026-06-01T16:45:00Z

Summary

Two things, both validated by actually deploying to serverless and running a job on fe-vm-shared-interop:

Bug fix (caught by the live run): the spark_python_task sibling-import bootstrap used os.path.abspath(__file__). On serverless the file runs via exec() in an IPython kernel where __file__ is undefined, so every job failed immediately with NameError: name '__file__' is not defined. Fixed by falling back to os.getcwd() (Databricks sets cwd to the file's workspace folder) when __file__ isn't defined — applied to all five entry files. Re-running Partition Manager after the fix imports cleanly and runs through to the Lakebase API call.
Documentation (the main ask): properly document when/how the cold-tier RollUp re-aggregation is used.

Changes

databricks/workflows/*.py — __file__-safe bootstrap (try __file__, except NameError → os.getcwd()).

databricks/bundles/databricks.yml — service_principal_name now defaults to "" so the dev target validates/deploys without it (a prior bug: defaultless variable blocked even dev). Prod still overrides via --var.

Docs

rollups.md → expanded Hot-tier vs cold-tier refresh: tier auto-detection, when you need cold refresh (late-arriving data, ETL corrections/restatements, manual backfill), when you don't (append-only / frozen-once-tiered), and a how-to.
workflow-jobs.md → When to run Cold RollUp Refresh + cross-link.

Test plan

All five workflow files byte-compile; __file__ only used inside the NameError guard
tests/test_python_patterns.py passes (11/11)
Docusaurus build passes (no broken internal links)
databricks bundle validate -t dev → OK; deploy -t dev provisions all five serverless jobs
Serverless run of Partition Manager gets past the import (bootstrap fix confirmed)
Full green run blocked on locating the Lakebase instance (lakets-tiering-test not found in this workspace) — separate, config-side

…cument cold-tier refresh Verified by deploying to serverless and running Partition Manager: the spark_python_task runs via exec() in an IPython kernel where __file__ is undefined, so the previous `os.path.abspath(__file__)` bootstrap raised NameError on every job. - workflows: guard the sys.path bootstrap — use __file__ when defined, else fall back to os.getcwd() (Databricks sets cwd to the file's workspace folder). Applied to all five entry files. - bundle: give service_principal_name a default of "" so the dev target validates/deploys without it (prod still overrides via --var) - docs(rollups): expand "Hot-tier vs cold-tier refresh" with how the tier is auto-detected, when cold_rollup_refresh IS needed (late-arriving data, ETL corrections/restatements, manual backfill), when it is not (append-only / frozen-once-tiered), and a how-to - docs(workflow-jobs): add "When to run Cold RollUp Refresh" + cross-link

taran-dbx merged commit fc3a037 into main Jun 1, 2026
9 checks passed

taran-dbx deleted the fix/serverless-file-bootstrap-and-cold-rollup-docs branch June 1, 2026 16:45

github-actions Bot added documentation Improvements or additions to documentation databricks-workflows area: rollup labels Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(jobs): file-safe serverless bootstrap + document cold-tier refresh#31

fix(jobs): file-safe serverless bootstrap + document cold-tier refresh#31
taran-dbx merged 1 commit into
mainfrom
fix/serverless-file-bootstrap-and-cold-rollup-docs

taran-dbx commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taran-dbx commented Jun 1, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant