Skip to content

Add databricks-serverless-storage-check skill#82

Open
GabbysCode wants to merge 2 commits into
databricks:mainfrom
GabbysCode:add-serverless-storage-check
Open

Add databricks-serverless-storage-check skill#82
GabbysCode wants to merge 2 commits into
databricks:mainfrom
GabbysCode:add-serverless-storage-check

Conversation

@GabbysCode
Copy link
Copy Markdown

@GabbysCode GabbysCode commented May 20, 2026

Summary

Adds databricks-serverless-storage-check — a skill that ships an executable preflight scanner detecting the antipattern where serverless tasks share state through /local_disk0, /tmp, or trustedTemp paths. This is the failure mode behind INTERNAL_ERROR: [Errno 13] Permission denied: '/local_disk0/.../trustedTemp.../...', where a parent task writes to local disk and a child task on a different node cannot read it.

Complementary to the existing databricks-serverless-migration skill (which covers single-notebook migration and correctly recommends /local_disk0/tmp for intra-task scratch). This new skill covers the cross-task case.

Contents

Path Purpose
skills/databricks-serverless-storage-check/SKILL.md When-to-use, quick start, output interpretation, the core rule, related skills
skills/databricks-serverless-storage-check/agents/openai.yaml Codex marketplace metadata (hand-authored)
skills/databricks-serverless-storage-check/scripts/preflight.py Stdlib-only AST + regex scanner with 5 input modes (--notebook / --dir / --job-yaml / --job-id / --run-id), --json flag, exit codes 0/1/2
skills/databricks-serverless-storage-check/scripts/test_preflight.py 7 self-tests, runs with python3 and no third-party deps
skills/databricks-serverless-storage-check/references/pattern-catalog.md Full table for FANOUT001–006 + ENV001 with examples, fixes, and the AST/regex rules used
skills/databricks-serverless-storage-check/references/remediation-guide.md Before/after code for Volumes / Workspace / taskValues / pipeline-downstream handoffs, plus what-not-to-do anti-examples
skills/databricks-serverless-storage-check/eval/ground_truth.yaml 4 SkillForge test cases (3 positive triggers + 1 boundary) — stf lint clean (0 errors, 0 warnings)
skills/databricks-serverless-storage-check/eval/thinking_instructions.md L4 reasoning criteria (Efficiency, Clarity, Recovery, Completeness, Hierarchy Awareness, Scanner Output Hygiene)
skills/databricks-serverless-storage-check/eval/output_instructions.md L5 output criteria with expected artifacts, mandatory facts, negative signals, and per-case acceptance bar
skills/databricks-serverless-storage-check/eval/manifest.yaml SkillForge eval config

Detection rules

ID Severity Detects
FANOUT001 Blocker Local-disk path passed to dbutils.notebook.run, taskValues.set, or job-task parameter (resolved through variable assignments and dict/list/tuple literals)
FANOUT002 Blocker Child notebook (one that uses widgets.get / taskValues.get) reads from a /local_disk0 or /tmp path
FANOUT003 Warning DAB job with multiple sibling tasks referencing the same local-disk path
FANOUT004 Warning pipeline_task immediately downstream of a notebook_task that wrote to local temp
FANOUT005 Info dbutils.fs.cp local-to-local inside a notebook invoked by a multi-task job (heuristic)
FANOUT006 Blocker Hardcoded BSI signature /local_disk0/spark-*/trustedTemp/... anywhere in source
ENV001 Info --run-id mode only: routes ENVIRONMENT_SETUP_ERROR.PYTHON_NOTEBOOK_ENVIRONMENT to support escalation (not a fixable pattern)

Sibling cross-reference

Adds one line to skills/databricks-serverless-migration/SKILL.md (Category B: Data Access table) clarifying that /local_disk0/tmp is per-task scratch only and pointing to this skill for cross-task concerns. Flagged here because it touches a sibling skill.

Validation

  • python3 scripts/skills.py validateEverything is up to date.
  • python3 skills/databricks-serverless-storage-check/scripts/test_preflight.py7/7 passing
  • stf lint skills/databricks-serverless-storage-check0 errors, 0 warnings

Checklist

  • python3 scripts/skills.py validate passes
  • SKILL_METADATA entry added in scripts/skills.py
  • agents/openai.yaml hand-authored
  • Self-tests pass (7/7)
  • SKILL.md body under 250 lines (149 lines)
  • Frontmatter description includes trigger phrases: trustedTemp, local_disk0, permission denied, fan-out, cross-task
  • stf lint passes (0 errors, 0 warnings)
  • SkillForge eval scaffolded: ground_truth.yaml (4 cases), thinking_instructions.md, output_instructions.md, manifest.yaml
  • Signed off per DCO

…disk handoffs

Adds a new skill `databricks-serverless-storage-check` that ships an
executable preflight scanner for the antipattern where parent/child
tasks share state through /local_disk0, /tmp, or trustedTemp paths --
the failure seen in serverless jobs that fail with
`INTERNAL_ERROR: [Errno 13] Permission denied` on local-disk paths.

The scanner (scripts/preflight.py, stdlib-only, AST + regex) supports
five input modes (--notebook, --dir, --job-yaml, --job-id, --run-id)
and 7 detection rules (FANOUT001-006 plus ENV001 which routes
env-sync errors to support escalation). All 7 self-tests pass.

Complementary to databricks-serverless-migration (single-notebook
migration). Added a one-line cross-reference from that skill's
data-access table pointing here for multi-task fan-out concerns.

Includes the required agents/openai.yaml (hand-authored) and
SKILL_METADATA entry in scripts/skills.py; manifest regenerated and
`python3 scripts/skills.py validate` passes.

Signed-off-by: GABRIELLE DOMPREH <Gabby.dompreh@databricks.com>
@GabbysCode GabbysCode requested review from a team, lennartkats-db and simonfaltum as code owners May 20, 2026 08:24
@dustinvannoy-db dustinvannoy-db self-requested a review May 27, 2026 17:22
@dustinvannoy-db
Copy link
Copy Markdown
Collaborator

We will reach out directly about possible consolidation of this PR and one of our Field Eng maintainers will add a review once we feel its ready.

…ruth.yaml

Adds generation_session_id and sources to all four test cases so
stf lint passes cleanly (0 errors, 0 warnings).

Co-authored-by: Isaac
@GabbysCode GabbysCode requested a review from a team as a code owner June 5, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants