fix(jobs): __file__-safe serverless bootstrap + document cold-tier refresh#31
Merged
taran-dbx merged 1 commit intoJun 1, 2026
Merged
Conversation
…cument cold-tier refresh Verified by deploying to serverless and running Partition Manager: the spark_python_task runs via exec() in an IPython kernel where __file__ is undefined, so the previous `os.path.abspath(__file__)` bootstrap raised NameError on every job. - workflows: guard the sys.path bootstrap — use __file__ when defined, else fall back to os.getcwd() (Databricks sets cwd to the file's workspace folder). Applied to all five entry files. - bundle: give service_principal_name a default of "" so the dev target validates/deploys without it (prod still overrides via --var) - docs(rollups): expand "Hot-tier vs cold-tier refresh" with how the tier is auto-detected, when cold_rollup_refresh IS needed (late-arriving data, ETL corrections/restatements, manual backfill), when it is not (append-only / frozen-once-tiered), and a how-to - docs(workflow-jobs): add "When to run Cold RollUp Refresh" + cross-link
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two things, both validated by actually deploying to serverless and running a job on
fe-vm-shared-interop:Bug fix (caught by the live run): the
spark_python_tasksibling-import bootstrap usedos.path.abspath(__file__). On serverless the file runs viaexec()in an IPython kernel where__file__is undefined, so every job failed immediately withNameError: name '__file__' is not defined. Fixed by falling back toos.getcwd()(Databricks sets cwd to the file's workspace folder) when__file__isn't defined — applied to all five entry files. Re-running Partition Manager after the fix imports cleanly and runs through to the Lakebase API call.Documentation (the main ask): properly document when/how the cold-tier RollUp re-aggregation is used.
Changes
databricks/workflows/*.py—__file__-safe bootstrap (try__file__, exceptNameError→os.getcwd()).databricks/bundles/databricks.yml—service_principal_namenow defaults to""so the dev target validates/deploys without it (a prior bug: defaultless variable blocked even dev). Prod still overrides via--var.Docs
rollups.md→ expanded Hot-tier vs cold-tier refresh: tier auto-detection, when you need cold refresh (late-arriving data, ETL corrections/restatements, manual backfill), when you don't (append-only / frozen-once-tiered), and a how-to.workflow-jobs.md→ When to run Cold RollUp Refresh + cross-link.Test plan
__file__only used inside theNameErrorguardtests/test_python_patterns.pypasses (11/11)databricks bundle validate -t dev→ OK;deploy -t devprovisions all five serverless jobslakets-tiering-testnot found in this workspace) — separate, config-side